Methods and apparatus for determining relative position, storage media and electronic devices

By fusing information from LiDAR and semantic cameras, their relative positions are determined and the matching is optimized, solving the problem of low object matching accuracy between semantic cameras and LiDAR. This enables accurate recognition of objects such as lane lines and traffic signs, improving the perception capabilities of autonomous vehicles.

CN116430404BActive Publication Date: 2026-06-30FOSS (HANGZHOU) INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
FOSS (HANGZHOU) INTELLIGENT TECH CO LTD
Filing Date
2023-04-13
Publication Date
2026-06-30

Smart Images

  • Figure CN116430404B_ABST
    Figure CN116430404B_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, storage medium, and electronic device for determining relative positions. The method includes: performing semantic recognition on a first point cloud sensed by a lidar on a target vehicle at the current time to obtain first lane line information and first traffic sign recognition system (TSR) information; acquiring second lane line information and second TSR information output by a semantic camera on the target vehicle at the current time; and determining the relative position of the target corresponding to the lidar and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information. By adopting the above technical solution, the problem of low matching accuracy between the object identified by the semantic camera and the object identified by the lidar in related technologies is solved, and the technical effect of improving the matching accuracy between the object identified by the semantic camera and the object identified by the lidar is achieved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the automotive field, and more specifically, to a method and apparatus for determining relative positions, a storage medium, and an electronic device. Background Technology

[0002] Autonomous vehicles typically perceive objects on the road using sensors such as cameras, LiDAR, and millimeter-wave radar deployed on the vehicle. While cameras can effectively identify the type of object being scanned, they cannot acquire information such as distance and speed. LiDAR, on the other hand, can obtain real-time position and speed information, but it cannot accurately identify the type of object. Therefore, multi-sensor perception fusion technology is currently a popular approach in perception technology. Combining the characteristics of each sensor allows for better identification of the position, speed, and type of scanned objects. To better integrate the characteristics of cameras and LiDAR, it is often necessary to jointly calibrate the relative positions of the same objects identified by LiDAR and semantic cameras, ensuring accurate acquisition of the object's type, position, and speed information.

[0003] In existing technologies, real-time matching of point clouds scanned by cameras and those scanned by LiDAR often relies on static calibration using a calibration board. This method requires external assistance, uses static data, and necessitates constant repositioning of the calibration board. When the vehicle moves and vibrates, the accuracy of the calibration parameters cannot be guaranteed, potentially leading to high costs and limited applicability. Furthermore, it is often difficult to extract semantic information from the LiDAR, resulting in poor calibration accuracy.

[0004] There is still no effective solution to the problem of low matching accuracy between objects identified by semantic cameras and objects identified by lidar in related technologies. Summary of the Invention

[0005] This application provides a method and apparatus for determining relative positions, a storage medium, and an electronic device to at least solve the problem in the related art where the accuracy of matching between objects identified by semantic cameras and objects identified by lidar is low.

[0006] According to one embodiment of this application, a method for determining relative position is provided, comprising: performing semantic recognition on a first point cloud sensed by a lidar on a target vehicle at the current time to obtain first lane line information and first traffic sign recognition system (TSR) information, wherein the first lane line information is used to identify lane lines identified by the lidar in the space where the target vehicle is located at the current time, and the first TSR information is used to identify traffic sign objects identified by the lidar in the space where the target vehicle is located at the current time; acquiring second lane line information and second TSR information output by a semantic camera on the target vehicle at the current time, wherein the second lane line information is used to identify lane lines sensed by the lidar in the space where the target vehicle is located at the current time; and acquiring second lane line information and second TSR information output by a semantic camera on the target vehicle at the current time. The lane information is used to identify the lane lines identified by the semantic camera in the space where the target vehicle is located at the current time, and the second TSR information is used to identify the traffic sign objects identified by the semantic camera in the space where the target vehicle is located at the current time. Based on the first lane information, the first TSR information, the second lane information, and the second TSR information, the target relative position corresponding to the lidar and the semantic camera at the current time is determined, wherein the target relative position is used to represent the relative position between the target object identified by the lidar and the target object identified by the semantic camera, and the target object includes lane lines or traffic sign objects.

[0007] Optionally, the step of performing semantic recognition on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain the first lane line information and the first traffic sign recognition system (TSR) information includes: dividing each point in the first point cloud into a first candidate point cloud and a second candidate point cloud, wherein the position of each point in the first candidate point cloud in the first coordinate system where the lidar is located is on the same plane as the target vehicle, and the position of each point in the second candidate point cloud in the first coordinate system is on a different plane from the target vehicle; obtaining the first lane line information based on the first candidate point cloud, and obtaining the first TSR information based on the second candidate point cloud.

[0008] Optionally, obtaining the first lane line information based on the first candidate point cloud includes: obtaining the reflection intensity of each point in the first candidate point cloud, wherein the first point cloud carries the reflection intensity of each point; determining points in the first candidate point cloud that satisfy a preset matching condition between the reflection intensity and a preset reflection intensity threshold, thereby obtaining a second point cloud; and fitting the second point cloud to obtain the first lane line information.

[0009] Optionally, obtaining the first TSR information based on the second candidate point cloud includes: when the second candidate point cloud includes N points and N equals 2, taking one of the N points as the center, determining a first reference point in the second candidate point cloud whose distance to the first reference point is less than or equal to the target radius, to obtain a third point cloud, wherein the first reference point and the first reference point are used to identify target traffic sign objects of the target type; fitting the third point cloud to obtain the first TSR information; and / or when the second candidate point cloud includes the N points and N is a positive integer greater than 2, by executing... Perform the following steps to obtain the TSR information corresponding to the i-th point, wherein the first TSR information includes the TSR information corresponding to each of the N points: determine a fourth point cloud containing the i-th point in the second candidate point cloud, wherein the distance between each point in the fourth point cloud and at least one point in the fourth point cloud is less than or equal to the target radius, and each point in the fourth point cloud is used to identify the target traffic sign object of the target type; fit the fourth point cloud to obtain the TSR information corresponding to the i-th point, where i is a positive integer greater than or equal to 1 and less than or equal to N.

[0010] Optionally, determining the target relative position of the LiDAR and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information includes: determining a first preliminary relative position of the LiDAR and the semantic camera at the current time based on the first lane line information and the second lane line information, wherein the first preliminary relative position is used to represent the preliminary relative position of the target object identified by the LiDAR and the target object identified by the semantic camera on a horizontal plane, the horizontal plane being the plane where the semantic camera is located; and adjusting the first preliminary relative position based on the first TSR information and the second TSR information. The settings are adjusted to obtain the first relative position of the LiDAR and the semantic camera at the current time, and the second relative position of the LiDAR and the semantic camera at the current time is determined based on the first TSR information and the second TSR information. The target relative position includes the first relative position and the second relative position. The first relative position is used to represent the relative position of the target object identified by the LiDAR and the target object identified by the semantic camera on the horizontal plane. The second relative position is used to represent the relative position of the target object identified by the LiDAR and the target object identified by the semantic camera on the vertical plane. The vertical plane is a plane perpendicular to the horizontal plane.

[0011] Optionally, determining the first preliminary relative position of the lidar and the semantic camera at the current time based on the first lane line information and the second lane line information includes: starting from a preset first initial relative position, adjusting the first current relative position until a first target condition is met, and determining the first current relative position when the first target condition is met as the first preliminary relative position; wherein, the first target condition includes: the sum of the loss function values ​​of M points is less than or equal to a preset first target threshold, wherein the M points are obtained by projecting the first lane line information onto the second coordinate system where the semantic camera is located according to the first current relative position to obtain the target. The space where the vehicle is located contains M points of lane lines identified by the LiDAR; wherein, the loss function value of the M points is obtained through the following steps: inputting the first projection position of the j-th point in the second coordinate system and the first reference position of the j-th point in the second coordinate system into a preset first loss function to obtain the loss function value of the j-th point, wherein the loss function value of the j-th point is used to represent the positional difference between the projection position of the j-th point in the second coordinate system and the reference position of the j-th point in the second coordinate system, M is a positive integer greater than or equal to 1, and j is a positive integer greater than or equal to 1 and less than or equal to M.

[0012] Optionally, adjusting the first preliminary relative position based on the first TSR information and the second TSR information to obtain the first relative position corresponding to the lidar and the semantic camera at the current time includes: starting from the first preliminary relative position, adjusting the first current relative position until a second target condition is met, and determining the first current relative position when the second target condition is met as the first relative position; wherein, the second target condition includes: the sum of the loss function values ​​of Q points is less than or equal to the second target threshold, wherein the Q points are obtained by projecting the first TSR information onto the second coordinate system according to the first current relative position to obtain the target... The system identifies Q points of traffic identification objects in the space where the vehicle is located, as determined by the LiDAR. The loss function values ​​for these Q points are obtained through the following steps: inputting the second projection position of the p-th point in the second coordinate system and the second reference position of the p-th point in the second coordinate system into a preset second loss function to obtain the loss function value of the p-th point. The loss function value of the p-th point represents the positional difference between the projection position of the p-th point in the second coordinate system and the reference position of the p-th point in the second coordinate system. Q is a positive integer greater than or equal to 1, and p is a positive integer greater than or equal to 1 and less than or equal to Q.

[0013] Optionally, adjusting the second initial relative position between the lidar and the semantic camera based on the first TSR information and the second TSR information to obtain the second relative position includes: starting from a preset second initial relative position, adjusting the second current relative position until a third target condition is met, and determining the second current relative position when the third target condition is met as the second relative position; wherein, the third target condition includes: the sum of the loss function values ​​of L points is less than or equal to a third target threshold, wherein the L points are obtained by projecting the first TSR information onto the second coordinate system according to the second current relative position to obtain the location of the target vehicle. The L points of traffic identification objects identified by the LiDAR in the space; wherein, the loss function value of the L points is obtained by the following steps: inputting the second projection position of the q-th point in the second coordinate system and the second reference position of the q-th point in the second coordinate system into a preset third loss function to obtain the loss function value of the q-th point, wherein the loss function value of the q-th point is used to represent the positional difference between the projection position of the q-th point in the second coordinate system and the reference position of the q-th point in the second coordinate system, where L is a positive integer greater than or equal to 1, and q is a positive integer greater than or equal to 1 and less than or equal to L.

[0014] Optionally, after determining the relative positions of the target corresponding to the lidar and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information, the method further includes: performing semantic recognition on the fifth point cloud sensed by the lidar on the target vehicle to obtain fourth lane line information and fourth TSR information, wherein the fourth lane line information is used to identify the lane lines identified by the lidar in the space where the target vehicle is located at the next time after the current time, and the fourth TSR information is used to identify the lane lines identified by the lidar in the space where the target vehicle is located at the next time after the current time. The system identifies traffic sign objects in time; it acquires the fifth lane line information and the fifth TSR information output by the semantic camera on the target vehicle, wherein the fifth lane line information is used to identify the lane line identified by the semantic camera in the space where the target vehicle is located in the next time, and the fifth TSR information is used to identify the traffic sign objects identified by the semantic camera in the space where the target vehicle is located in the next time; based on the fourth lane line information, the fourth TSR information, the fifth lane line information, the fifth TSR information, and the first TSR information, it determines the next relative position of the lidar and the semantic camera in the next time.

[0015] Optionally, determining the next relative position of the LiDAR and the semantic camera at the next time step based on the fourth lane information, the fourth TSR information, the fifth lane information, the fifth TSR information, and the first TSR information includes: performing a union operation on the points of traffic-identified objects in the first TSR information and the points of traffic-identified objects in the fourth TSR information to obtain sixth TSR information; determining the second preliminary relative position of the LiDAR and the semantic camera at the next time step based on the fourth lane information and the fifth lane information; adjusting the second preliminary relative position based on the fifth TSR information and the sixth TSR information to obtain the third relative position of the LiDAR and the semantic camera at the next time step; and determining the fourth relative position of the LiDAR and the semantic camera at the next time step based on the fifth TSR information and the sixth TSR information, wherein the next relative position includes the third relative position and the fourth relative position.

[0016] According to another embodiment of this application, a relative position determination device is also provided, comprising: a first identification module, configured to perform semantic recognition on a first point cloud sensed by a lidar on a target vehicle at the current time to obtain first lane line information and first traffic sign recognition system (TSR) information, wherein the first lane line information is used to identify lane lines identified by the lidar in the space where the target vehicle is located at the current time, and the first TSR information is used to identify traffic sign objects identified by the lidar in the space where the target vehicle is located at the current time; and a first acquisition module, configured to acquire second lane line information and second TSR information output by a semantic camera on the target vehicle at the current time, wherein... The second lane line information is used to identify the lane line recognized by the semantic camera in the space where the target vehicle is located at the current time, and the second TSR information is used to identify the traffic sign object recognized by the semantic camera in the space where the target vehicle is located at the current time; the first determining module is used to determine the target relative position corresponding to the lidar and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information and the second TSR information, wherein the target relative position is used to represent the relative position between the target object recognized by the lidar and the target object recognized by the semantic camera, and the target object includes lane lines or traffic sign objects.

[0017] According to another aspect of the embodiments of this application, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer program, and the computer program is configured to execute the above-described method for determining the relative position when it is run.

[0018] According to another aspect of the embodiments of this application, an electronic device is also provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the aforementioned method for determining the relative position through the computer program.

[0019] In this embodiment, the relative positions of identical lane lines or traffic sign objects identified by the LiDAR and the semantic camera can be determined based on the lane line information and TSR information identified by the LiDAR and the semantic camera. The point cloud scanned by the LiDAR can better reflect the position and distance of the identified objects, while the point cloud scanned by the semantic camera can better reflect the type of the identified objects. In this way, the position and distance of the objects scanned by the LiDAR are accurately matched with the objects captured by the semantic camera. This technical solution solves the problem of low matching accuracy between objects identified by the semantic camera and objects identified by the LiDAR in related technologies, achieving the technical effect of improving the matching accuracy between objects identified by the semantic camera and objects identified by the LiDAR. Attached Figure Description

[0020] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0021] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0022] Figure 1 This is a hardware structure block diagram of a mobile terminal according to an embodiment of the present application of a method for determining a relative position;

[0023] Figure 2 This is a flowchart of a method for determining a relative position according to an embodiment of this application;

[0024] Figure 3 This is a schematic diagram of a first point cloud according to an embodiment of this application;

[0025] Figure 4This is a schematic diagram of a second point cloud according to an embodiment of this application;

[0026] Figure 5 This is a schematic diagram of a fitted lane line according to an embodiment of this application;

[0027] Figure 6 This is a schematic diagram illustrating the extraction of TSR information and lane line information according to an embodiment of this application;

[0028] Figure 7 This is a flowchart illustrating the stitching of a local map from a point cloud output by a lidar, according to an embodiment of this application.

[0029] Figure 8 This is a schematic diagram illustrating the matching of semantic information of a lidar and semantic information of a semantic camera according to an embodiment of this application;

[0030] Figure 9 This is a schematic diagram of a loss function according to an embodiment of this application;

[0031] Figure 10 This is a schematic diagram illustrating the matching of semantic information of a lidar and semantic information of a semantic camera according to an embodiment of this application.

[0032] Figure 11 This is a flowchart of a dynamic joint calibration algorithm according to an embodiment of this application;

[0033] Figure 12 This is a schematic diagram of semantic registration according to an embodiment of this application;

[0034] Figure 13 This is a structural block diagram of a relative position determination device according to an embodiment of this application. Detailed Implementation

[0035] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.

[0036] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0037] The methods and embodiments provided in this application can be executed on a computer terminal, device terminal, or similar computing device. Taking running on a computer terminal as an example, Figure 1 This is a hardware structure block diagram of a mobile terminal according to a method for determining a relative position based on an embodiment of this application. For example... Figure 1 As shown, a mobile terminal may include one or more ( Figure 1 Only one is shown in the diagram. A processor 102 (which may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.) and a memory 104 for storing data are also shown. In one exemplary embodiment, the computer terminal may further include a transmission device 106 for communication functions and an input / output device 108. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the computer terminal described above. For example, the computer terminal may also include components that are more complex than those described above. Figure 1 The more or fewer components shown, or having the same Figure 1 Equivalent functions or ratios shown Figure 1 The functions shown have more different configurations.

[0038] The memory 104 can be used to store computer programs, such as application software programs and modules, like the computer program corresponding to the relative position determination method in this embodiment of the invention. The processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, thereby implementing the above-described method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory remotely located relative to the processor 102, and these remote memories can be connected to a computer terminal via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0039] The transmission device 106 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider for the computer terminal. In one example, the transmission device 106 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 106 may be a Radio Frequency (RF) module used for wireless communication with the Internet.

[0040] This embodiment provides a method for determining relative position, applied to the aforementioned computer terminal. Figure 2 This is a flowchart of a method for determining a relative position according to an embodiment of this application, such as... Figure 2 As shown, the process includes the following steps:

[0041] Step S202: Perform semantic recognition on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain first lane line information and first traffic sign recognition system (TSR) information. The first lane line information is used to identify the lane line identified by the lidar in the space where the target vehicle is located at the current time, and the first TSR information is used to identify the traffic sign object identified by the lidar in the space where the target vehicle is located at the current time.

[0042] Step S204: Obtain the second lane line information and the second TSR information output by the semantic camera on the target vehicle at the current time, wherein the second lane line information is used to identify the lane line identified by the semantic camera in the space where the target vehicle is located at the current time, and the second TSR information is used to identify the traffic sign object identified by the semantic camera in the space where the target vehicle is located at the current time.

[0043] Step S206: Based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information, determine the target relative position corresponding to the lidar and the semantic camera at the current time. The target relative position is used to represent the relative position between the target object identified by the lidar and the target object identified by the semantic camera. The target object includes lane lines or traffic sign objects.

[0044] Through the above steps, the relative positions of identical lane lines or traffic signs identified by both the LiDAR and semantic camera can be determined based on the lane line and TSR information recognized by the LiDAR and the semantic camera. The point cloud scanned by the LiDAR accurately reflects the position and distance of the identified objects, while the point cloud scanned by the semantic camera accurately reflects the type of the identified objects. This method achieves accurate matching between the position and distance of objects scanned by the LiDAR and objects captured by the semantic camera. This technical solution addresses the problem of low matching accuracy between objects identified by the semantic camera and those identified by the LiDAR in related technologies, thus improving the accuracy of matching between the two systems.

[0045] In the technical solution provided in step S202 above, the lidar on the target vehicle may include, but is not limited to, different types of lidar, such as: low-line-count mechanical lidar, etc., and may, but is not limited to, perform semantic recognition on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain the first lane line information and the first traffic sign recognition system (TSR) information.

[0046] Optionally, in this embodiment, the first TSR information may be used, but is not limited to, to identify traffic sign objects in the space where the target vehicle is located that the lidar has identified at the current time, such as traffic lights (e.g., traffic lights), traffic signs, and streetlights.

[0047] Optionally, in this embodiment, the first lane information may include, but is not limited to, the types of lane lines identified by the LiDAR in the space where the target vehicle is located at the current time, such as: stop lines, lane lines of the lane in which the target vehicle is traveling, and the permitted driving methods of the lane in which the target vehicle is traveling, etc.

[0048] Figure 3 This is a schematic diagram of a first point cloud according to an embodiment of this application, such as... Figure 3 As shown, the first point cloud sensed by the lidar may include, but is not limited to, point clouds used to identify the lane lines where the target vehicle is located and point clouds used to identify the first TSR information. The point cloud used to identify the first TSR information may include, but is not limited to, point clouds used to identify street light 1 and street light 2. The point cloud used to identify street light 1 may include, but is not limited to, points a, b, c and d. The point cloud used to identify street light 2 may include, but is not limited to, points e, f and g.

[0049] In an exemplary embodiment, the first lane line information and the first traffic sign recognition system (TSR) information can be obtained, but are not limited to, by the following means: dividing each point in the first point cloud into a first candidate point cloud and a second candidate point cloud, wherein the position of each point in the first candidate point cloud in the first coordinate system where the lidar is located is on the same plane as the target vehicle, and the position of each point in the second candidate point cloud in the first coordinate system is on a different plane from the target vehicle; obtaining the first lane line information based on the first candidate point cloud, and obtaining the first TSR information based on the second candidate point cloud.

[0050] Optionally, in this embodiment, each point in the first point cloud identified by the lidar may be divided into a first candidate point cloud and a second candidate point cloud. The first candidate point cloud may be, but is not limited to, a ground point cloud (which may include point clouds used to identify lane lines), and the second candidate point cloud may be, but is not limited to, a non-ground point cloud (which may include point clouds used to identify traffic sign objects). First lane line information is obtained based on the first candidate point cloud, and first TSR information is obtained based on the second candidate point cloud.

[0051] In an exemplary embodiment, the first lane line information can be obtained from the first candidate point cloud in the following manner, but not limited to: obtaining the reflection intensity of each point in the first candidate point cloud, wherein the first point cloud carries the reflection intensity of each point; determining points in the first candidate point cloud that satisfy a preset matching condition between the reflection intensity and a preset reflection intensity threshold, thereby obtaining a second point cloud; and fitting the second point cloud to obtain the first lane line information.

[0052] Optionally, in this embodiment, points in the first candidate point cloud that satisfy a preset matching condition between the reflection intensity and a preset reflection intensity threshold can be determined, but are not limited to. Figure 4 This is a schematic diagram of a second point cloud according to an embodiment of this application, such as... Figure 4 As shown, but not limited to, points whose difference between the reflection intensity of each point in the first candidate point cloud and the reflection intensity of the asphalt road surface (or other road surface driven by the target vehicle, which is not limited in this application) (i.e., the aforementioned preset reflection intensity threshold) can be extracted from the first candidate point cloud to obtain a second point cloud for identifying lane lines.

[0053] Optionally, in this embodiment, given the second point cloud, lane lines can be fitted based on, but are not limited to, the second point cloud. Figure 5 This is a schematic diagram of a fitted lane line according to an embodiment of this application, such as... Figure 5As shown, the lane lines identified by the second point cloud can be fitted using the Ransac algorithm (or other algorithms, which are not limited to this application), and then lane line semantic information (i.e., first lane line information) can be output. The lane line semantic information carries the position of the points of the lane line relative to the target vehicle (e.g., located to the left of the target vehicle, or located to the right of the target vehicle, etc.).

[0054] In one exemplary embodiment, obtaining the first TSR information may include, but is not limited to, at least one of the following:

[0055] In scenario one, if the second candidate point cloud includes N points and N equals 2, a first reference point is determined in the second candidate point cloud with one of the N points as the center. The distance between the reference point and the reference point is less than or equal to the target radius. This results in a third point cloud. The first reference point and the reference point are used to identify the target traffic sign object of the target type. The third point cloud is fitted to obtain the first TSR information.

[0056] Optionally, in this embodiment, when the second candidate point cloud includes only 2 points, a first reference point can be determined in the second candidate point cloud with one of the points as the center. The distance between the first reference point and the first reference point is less than or equal to the target radius. When the first reference point is determined in the second candidate point cloud with a distance less than or equal to the target radius, a third point cloud is obtained. The third point cloud is fitted to obtain the first TSR information.

[0057] Scenario 2: When the second candidate point cloud includes the N points and N is a positive integer greater than 2, the following steps are performed to obtain the TSR information corresponding to the i-th point, wherein the first TSR information includes the TSR information corresponding to each of the N points: A fourth point cloud containing the i-th point is determined in the second candidate point cloud, wherein the distance between each point in the fourth point cloud and at least one point in the fourth point cloud is less than or equal to the target radius, and each point in the fourth point cloud is used to identify the target traffic sign object of the target type; The fourth point cloud is fitted to obtain the TSR information corresponding to the i-th point, where i is a positive integer greater than or equal to 1 and less than or equal to N.

[0058] Optionally, in this embodiment, when the second candidate point cloud includes more than two points, a point can be selected from the non-ground point cloud (i.e., the second candidate point cloud), and it can be searched to see if there are other points within a 1m radius of this point. If so, these points are grouped together into one category. Then, it is searched to see if there are other points within a 1m radius (i.e., the target radius, which can be adjusted according to actual needs, and this application does not impose any restrictions) of each point in the already grouped point cloud, until the number of points in this category no longer increases, indicating that the clustering of this category of point cloud is complete. Then, a point is selected from the unclassified points in the second candidate point cloud, and it is searched to see if there are other points within a 1m radius of this point, until all points in the second candidate point cloud have been searched.

[0059] Optionally, in this embodiment, points for identifying traffic sign objects can be extracted from the non-ground point cloud (i.e., the second candidate point cloud mentioned above) by using Euclidean clustering algorithm and Ransac algorithm (or other algorithms, which are not limited in this application). Figure 6 This is a schematic diagram illustrating the extraction of TSR information and lane line information according to an embodiment of this application, as shown below. Figure 6 As shown, it is possible, but not limited to, to acquire the point cloud (i.e., the first point cloud) sensed by the lidar, fuse and optimize the lidar odometry pose SLAM (simultaneous localization and mapping) and GNSS (Global Navigation Satellite System) pose to obtain accurate mileage information, and combine the mileage information with the lidar ground point cloud (i.e., the first candidate point cloud) and the non-ground point cloud (i.e., the second candidate point cloud) to stitch together the ground point cloud map and the non-ground point cloud map.

[0060] Figure 7 This is a flowchart illustrating the stitching of a local map from a point cloud output by a LiDAR, according to an embodiment of this application. Figure 7 As shown, distortion compensation can be performed on the first point cloud sensed by the lidar (radar) using an IMU (Inertial Measurement Unit), but not limited to. Ground point cloud and non-ground point cloud are segmented based on ground fitting. The corrected point cloud is then registered to obtain laser odometer information. The laser odometer pose and GNSS pose are fused and optimized to obtain accurate odometer information. Based on the odometer information, the lidar ground point cloud and non-ground point cloud are combined to stitch together a ground point cloud map and a non-ground point cloud map.

[0061] It is possible, but not limited to, extracting a point cloud (second point cloud) for representing lane lines from a ground point cloud, fitting the second point cloud to obtain lane line semantic information (i.e., first lane line information). It is possible, but not limited to, clustering and segmenting a point cloud for representing traffic sign objects from a non-ground point cloud, fitting the point cloud for representing traffic sign objects to obtain first TSR information for traffic sign objects, including streetlights (e.g., streetlight poles), traffic lights, and traffic signs.

[0062] In the technical solution provided in step S204 above, the second lane line information and the second TSR information output by the semantic camera on the target vehicle at the current time can be obtained, but are not limited to. The second lane line information is used to identify the type of lane line (e.g., stop line, lane line of the lane in which the target vehicle is traveling, etc.) identified by the semantic camera in the space where the target vehicle is located at the current time. The second TSR information is used to identify the traffic sign objects (including but not limited to streetlights (e.g., streetlight poles), traffic lights, and traffic signs, etc.) identified by the semantic camera in the space where the target vehicle is located at the current time.

[0063] In the technical solution provided in step S206 above, the relative position between the lane line or traffic sign object identified by the LiDAR and the lane line or traffic sign object identified by the semantic camera can be determined, but is not limited to, based on the first lane line information and the first TSR information identified by the LiDAR, and the second lane line information and the second TSR information identified by the semantic camera. Since the point cloud scanned by the LiDAR can better reflect the position and distance of the identified object, and the point cloud scanned by the semantic camera can better reflect the type of the identified object, the accuracy of matching between the object identified by the semantic camera and the object identified by the LiDAR is improved in this way.

[0064] Optionally, in this embodiment, the relative position between the target object identified by the LiDAR and the target object identified by the semantic camera can be identified, but is not limited to, through extrinsic parameters between the LiDAR and the semantic camera. More specifically, by determining the relative position of the target corresponding to the LiDAR and the semantic camera at the current time, the position and distance of the target object identified by the LiDAR can be accurately projected onto the target object identified by the semantic camera. This improves the safety of the autonomous driving process of the target vehicle.

[0065] In an exemplary embodiment, the target relative position corresponding to the lidar and the semantic camera at the current time can be determined, but is not limited to, by the following methods: determining a first preliminary relative position corresponding to the lidar and the semantic camera at the current time based on the first lane line information and the second lane line information, wherein the first preliminary relative position is used to represent the preliminary relative position of the target object identified by the lidar and the target object identified by the semantic camera on a horizontal plane, the horizontal plane being the plane where the semantic camera is located; adjusting the first preliminary relative position based on the first TSR information and the second TSR information to obtain the current time. The first relative position corresponding to the lidar and the semantic camera is determined based on the first TSR information and the second TSR information, and the second relative position corresponding to the lidar and the semantic camera at the current time is determined. The target relative position includes the first relative position and the second relative position. The first relative position is used to represent the relative position of the target object identified by the lidar and the target object identified by the semantic camera on the horizontal plane. The second relative position is used to represent the relative position of the target object identified by the lidar and the target object identified by the semantic camera on the vertical plane. The vertical plane is a plane perpendicular to the horizontal plane.

[0066] Optionally, in this embodiment, lane lines can be considered to exist consistently on the road where the target vehicle is traveling. It is possible, but not limited to, matching the lane line information identified by the LiDAR with the lane line information output by the semantic camera. Here, the lane lines identified by the LiDAR and the lane lines output by the semantic camera are classified, which can better distinguish the left and right lane lines, thereby performing coarse alignment.

[0067] Optionally, in this embodiment, the first preliminary relative position of the LiDAR and the semantic camera at the current time can be determined based on the first lane line information and the second lane line information. The first preliminary relative position can be identified by the y, z, yaw (yaw angle) extrinsic parameters. That is, the extrinsic parameters y, z, yaw between the LiDAR and the semantic camera can be initially adjusted based on the first lane line information and the second lane line information to obtain the preliminary values ​​of the extrinsic parameters y, z, yaw.

[0068] Optionally, in this embodiment, after matching the lane line information identified by the LiDAR and the lane line information output by the semantic camera to obtain partial parameters (i.e., y, z, yaw), these partial parameters (i.e., y, z, yaw) are substituted into the matching of the TSR information identified by the LiDAR and the TSR information identified by the semantic camera. The TSR information identified by the LiDAR and the TSR information output by the semantic camera (the TSR information identified by the LiDAR can be determined by the longitudinal distance, and the TSR information output by the semantic camera can be determined by the pixel distribution) are coarsely aligned. The extrinsic parameters between the LiDAR and the semantic camera are adjusted, which may include, but are not limited to, x, roll, pitch, y, z, and yaw.

[0069] Optionally, in this embodiment, after obtaining the first preliminary relative position, the extrinsic parameters y, z, and yaw used to identify the first preliminary relative position can be adjusted according to the first TSR information and the second TSR information to obtain the first relative position corresponding to the lidar and the semantic camera at the current time. Then, the second relative position corresponding to the lidar and the semantic camera at the current time can be determined according to the first TSR information and the second TSR information. The second preliminary relative position can be identified by the extrinsic parameters x, roll, and pitch.

[0070] Figure 8 This is a schematic diagram illustrating the matching of semantic information of a lidar and semantic information of a semantic camera according to an embodiment of this application, as shown below. Figure 8 As shown, the LiDAR scanned at t0, t1, ..., tn within time T0, obtaining a local map of SACN_0, SACN_1, ..., SACN_n within time T0. The semantic camera scanned at time T0 and obtained image_0. SACN_0, SACN_1, ..., SACN_n can be matched with image_0 to obtain the extrinsic parameters between the semantic camera and the LiDAR at time T0, ... The LiDAR scanned at tn+1, tn+2, ..., tn+m within time Tn, obtaining a local map of SACN_n+1, SACN_n+2, ..., SACN_m within time Tn. The semantic camera scanned at time Tn and obtained image_n. SACN_n+1, SACN_n+2, ..., SACN_m can be matched with image_n to obtain the extrinsic parameters between the semantic camera and the LiDAR at time Tn.

[0071] In an exemplary embodiment, the first preliminary relative position corresponding to the lidar and the semantic camera at the current time can be determined, but is not limited to, by the following method: starting from a preset first initial relative position, adjusting the first current relative position until a first target condition is met, and determining the first current relative position when the first target condition is met as the first preliminary relative position; wherein, the first target condition includes: the sum of the loss function values ​​of M points is less than or equal to a preset first target threshold, wherein the M points are obtained by projecting the first lane line information onto the second coordinate system where the semantic camera is located according to the first current relative position to obtain the space where the target vehicle is located. The laser radar identifies M points of the lane line; wherein, the loss function value of the M points is obtained through the following steps: inputting the first projection position of the j-th point in the second coordinate system and the first reference position of the j-th point in the second coordinate system into a preset first loss function to obtain the loss function value of the j-th point, wherein the loss function value of the j-th point is used to represent the positional difference between the projection position of the j-th point in the second coordinate system and the reference position of the j-th point in the second coordinate system, M is a positive integer greater than or equal to 1, and j is a positive integer greater than or equal to 1 and less than or equal to M.

[0072] Optionally, in this embodiment, the first preliminary relative position may be, but is not limited to, the relative position between the location of the LiDAR deployed on the target vehicle and the location of the semantic camera deployed on the target vehicle. The optimization variable may be, but is not limited to, the distance from a point to a line. Based on the current extrinsic parameters (i.e., the first preliminary relative position), the lane line information identified by the LiDAR is projected onto the coordinate system of the lane line information output by the semantic camera. The projected lane line information identified by the LiDAR is then broken down into M points. The distances of these M points are then projected onto the lane line information output by the semantic camera. The algorithm converges when the sum of the distances is minimized. Figure 9 This is a schematic diagram of a loss function according to an embodiment of this application, such as... Figure 9 As shown, the projection points of points i, l, and j in the coordinate system of the semantic camera are X(M,i), X(M,l), and X(M,j); the line formed by X(M,i), X(M,l), and X(M,j) represents the visual semantics. Therefore, the formula for calculating the loss function (i.e., the first loss function) is as follows:

[0073]

[0074] After obtaining the loss function (i.e., the first loss function), nonlinear optimization can be performed using the Levenberg-Marquardt (LM) method.

[0075] In an exemplary embodiment, the first preliminary relative position can be adjusted, but is not limited to, in the following manner to obtain the first relative position corresponding to the lidar and the semantic camera at the current time: starting from the first preliminary relative position, the first current relative position is adjusted until a second target condition is met, and the first current relative position when the second target condition is met is determined as the first relative position; wherein, the second target condition includes: the sum of the loss function values ​​of Q points is less than or equal to the second target threshold, wherein the Q points are obtained by projecting the first TSR information onto the second coordinate system according to the first current relative position to obtain the location of the target vehicle. The laser radar identifies Q points of traffic-related objects in space; wherein the loss function values ​​of the Q points are obtained through the following steps: inputting the second projection position of the p-th point in the second coordinate system and the second reference position of the p-th point in the second coordinate system into a preset second loss function to obtain the loss function value of the p-th point, wherein the loss function value of the p-th point is used to represent the positional difference between the projection position of the p-th point in the second coordinate system and the reference position of the p-th point in the second coordinate system, Q is a positive integer greater than or equal to 1, and p is a positive integer greater than or equal to 1 and less than or equal to Q.

[0076] Optionally, in this embodiment, the second loss function may be, but is not limited to, the same loss function as the first loss function, or a different loss function; the first target threshold may be, but is not limited to, the same threshold as the second target threshold, or a different threshold.

[0077] Optionally, in this embodiment, the y, z, and yaw values ​​after being adjusted based on the lane line information identified by the LiDAR and the lane line information identified by the semantic camera can be adjusted again based on the first TSR information and the second TSR information. In this way, the traffic identification objects scanned by the LiDAR and the traffic identification objects identified by the semantic camera are fully utilized, thereby improving the accuracy of the extrinsic parameters y, z, and yaw between the LiDAR and the semantic camera.

[0078] In an exemplary embodiment, the second initial relative position between the lidar and the semantic camera can be adjusted to obtain the second relative position by, but is not limited to, the following method: starting from a preset second initial relative position, adjusting the second current relative position until a third target condition is met, and determining the second current relative position when the third target condition is met as the second relative position; wherein, the third target condition includes: the sum of the loss function values ​​of L points is less than or equal to a third target threshold, wherein the L points are obtained by projecting the first TSR information onto the second coordinate system according to the second current relative position to obtain the lidar in the space where the target vehicle is located. The LiDAR identifies L points of traffic-related objects. The loss function values ​​of the L points are obtained through the following steps: inputting the second projection position of the q-th point in the second coordinate system and the second reference position of the q-th point in the second coordinate system into a preset third loss function to obtain the loss function value of the q-th point. The loss function value of the q-th point represents the positional difference between the projection position of the q-th point in the second coordinate system and the reference position of the q-th point in the second coordinate system. L is a positive integer greater than or equal to 1, and q is a positive integer greater than or equal to 1 and less than or equal to L.

[0079] Optionally, in this embodiment, the third loss function may be, but is not limited to, the same loss function as the first loss function and the second loss function, or a different loss function; the third target threshold may be, but is not limited to, the same threshold as the second target threshold and the first target threshold, or a different threshold, etc.

[0080] Optionally, in this embodiment, the extrinsic parameters x, roll, and pitch between the LiDAR and the semantic camera can be adjusted based on the first TSR information and the second TSR information until the third target condition is met. This indicates that the extrinsic parameters x, roll, and pitch between the LiDAR and the semantic camera have been adjusted. In this case, by adjusting the extrinsic parameters y, z, yaw, x, roll, and pitch between the LiDAR and the semantic camera, the target object identified by the LiDAR (e.g., the location of the target object, the distance between the target object and the target vehicle) can be accurately matched with the target object identified by the semantic camera (e.g., the type of the target object), thereby improving the accuracy of matching the target object identified by the LiDAR with the target object identified by the semantic camera.

[0081] To better understand the process of determining the relative position described above, the following description, in conjunction with optional embodiments, further illustrates the process of determining the relative position, but is not intended to limit the technical solutions of the embodiments of this application.

[0082] Figure 10 This is a schematic diagram illustrating the matching of semantic information of a lidar and semantic camera according to an embodiment of this application, as shown below. Figure 10 As shown, semantic information identified by LiDAR can be semantically correlated with semantic information output by semantic camera, and lane lines can be fitted from semantic information identified by LiDAR and semantic information output by semantic camera to construct the corresponding loss function (i.e. the first loss function mentioned above). The loss function is then optimized nonlinearly by Levenberg-Marquardt (LM) to obtain the initially adjusted extrinsic parameters part_1 (i.e., y, z, yaw used to identify the first initial relative position mentioned above).

[0083] Next, the initially adjusted extrinsic parameter part_1 is substituted into the matching of the TSR information identified by the LiDAR and the TSR information output by the semantic camera. The TSR information (which may include, but is not limited to, light poles, road signs, stop lines, etc.) is fitted from the semantic information identified by the LiDAR and the semantic information output by the semantic camera. The corresponding loss function is constructed and nonlinear optimization is performed through Levenberg-Marquardt LM to obtain the adjusted extrinsic parameter part_2 (i.e., y, z, and yaw used to identify the first relative position, and x, roll, and pitch used to identify the second relative position). The adjusted extrinsic parameters are obtained. Under the condition that the residuals converge, the adjusted extrinsic parameters between the LiDAR and the semantic camera are determined.

[0084] In one exemplary embodiment, the next relative position of the LiDAR and the semantic camera at the next time time can be determined by, but is not limited to, the following methods: performing semantic recognition on the fifth point cloud sensed by the LiDAR on the target vehicle to obtain fourth lane line information and fourth TSR information, wherein the fourth lane line information is used to identify the lane line identified by the LiDAR in the space where the target vehicle is located at the next time time, and the fourth TSR information is used to identify the traffic sign object identified by the LiDAR in the space where the target vehicle is located at the next time time; acquiring the fifth lane line information and fifth TSR information output by the semantic camera on the target vehicle, wherein the fifth lane line information is used to identify the lane line identified by the semantic camera in the space where the target vehicle is located at the next time time, and the fifth TSR information is used to identify the traffic sign object identified by the semantic camera in the space where the target vehicle is located at the next time time; and determining the next relative position of the LiDAR and the semantic camera at the next time time based on the fourth lane line information, the fourth TSR information, the fifth lane line information, the fifth TSR information, and the first TSR information.

[0085] Optionally, in this embodiment, when it is desired to determine the next relative position corresponding to the next time-based LiDAR and semantic camera, the next relative position can be determined based, but is not limited to, the fourth lane line information and fourth TSR information identified by the next time-based LiDAR at the current time, the fifth lane line information and fifth TSR information output by the semantic camera, and the first TSR information identified by the current time-based LiDAR. In this way, the TSR information identified by multiple time-based LiDARs is accumulated. When determining the next relative position corresponding to the next time-based LiDAR and semantic camera, the historically identified TSR information of the LiDAR can be used to more accurately depict traffic sign objects, thereby improving the accuracy of determining the next relative position corresponding to the next time-based LiDAR and semantic camera.

[0086] Optionally, in this embodiment, the current time may include, but is not limited to, a moment, a time period, etc., and may, but is not limited to, the next relative position of the LiDAR and the semantic camera in the next time period within the current time period based on the fourth lane line information, the fourth TSR information, the fifth lane line information, the fifth TSR information, and the first TSR information, or the next relative position of the LiDAR and the semantic camera in the next moment of the current moment.

[0087] In one exemplary embodiment, the next relative position of the lidar and the semantic camera at the next time time can be determined by, but is not limited to, the following method: taking the union of the traffic identification object points in the first TSR information and the traffic identification object points in the fourth TSR information to obtain sixth TSR information; determining the second preliminary relative position of the lidar and the semantic camera at the next time time based on the fourth lane line information and the fifth lane line information; adjusting the second preliminary relative position based on the fifth TSR information and the sixth TSR information to obtain the third relative position of the lidar and the semantic camera at the next time time; and determining the fourth relative position of the lidar and the semantic camera at the next time time based on the fifth TSR information and the sixth TSR information, wherein the next relative position includes the third relative position and the fourth relative position.

[0088] Optionally, in this embodiment, the TSR information accumulated with the TSR information identified by the LiDAR at the next time step can be, but is not limited to, the TSR information identified by the LiDAR at a time relatively close to the next time step. This avoids accumulating TSR information over a large time span, thus preventing excessive storage space usage and improving storage space utilization.

[0089] Optionally, in this embodiment, the TSR information identified by the current time lidar and the TSR information identified by the lidar at the next time can be combined using a union operation, or the TSR information identified by the current time lidar and the TSR information identified by the lidar at the next time can be directly accumulated (for example, directly concatenating the TSR information identified by the lidar at the current time and the TSR information identified by the lidar at the next time), without deleting overlapping points in the TSR information identified by the lidar at the current time and the TSR information identified by the lidar at the next time. This method reduces the difficulty of extracting traffic signs, lampposts, and other traffic sign objects, and improves the accuracy of their extraction.

[0090] To better understand the process of determining the relative position described above, the following description, in conjunction with optional embodiments, further illustrates the process of determining the relative position, but is not intended to limit the technical solutions of the embodiments of this application.

[0091] Figure 11 This is a flowchart of a dynamic joint calibration algorithm according to an embodiment of this application, such as... Figure 11 As shown, the semantic information identified by the LiDAR can be temporally aligned with the semantic information output by the semantic camera, but not limited to this. On one hand, the point cloud sensed by the LiDAR can be divided into ground point clouds and non-ground point clouds. The corrected point cloud is then registered to obtain LiDAR odometry information. The LiDAR odometry pose and GNSS pose are fused and optimized to obtain accurate odometry information. Based on the odometry information, the LiDAR ground point cloud and non-ground point cloud are combined to create a ground point cloud map and a non-ground point cloud map. Lane line semantics and stop line semantics of the target vehicle's lane are extracted from the ground point cloud, and TSR information of the target vehicle's space is extracted from the non-ground point cloud. On the other hand, lane line semantic windows and TSR semantic windows are extracted from the semantic information output by the semantic camera.

[0092] It is possible, but not limited to, to perform semantic matching between the semantic information identified by the LiDAR and the semantic information output by the semantic camera, and fit the lane line from the semantic information identified by the LiDAR and the semantic information output by the semantic camera, construct the corresponding loss function (i.e. the first loss function mentioned above), and perform nonlinear optimization through Levenberg-Marquardt LM to obtain the initially adjusted extrinsic parameters part_1 (i.e., y, z, yaw used to identify the first preliminary relative position mentioned above).

[0093] Next, substitute the initially adjusted extrinsic parameter part_1 into the matching of the TSR information identified by the LiDAR and the TSR information output by the semantic camera. Fit the TSR information (which may include, but is not limited to, light poles, road signs, stop lines, etc.) from the semantic information identified by the LiDAR and the semantic information output by the semantic camera. Construct the corresponding loss function and perform nonlinear optimization through Levenberg-Marquardt LM to obtain the adjusted extrinsic parameter part_2 (i.e., y, z, and yaw used to identify the first relative position, and x, roll, and pitch used to identify the second relative position). With the residual convergence, the adjusted extrinsic parameters between the LiDAR and the semantic camera are determined, and the calibration ends.

[0094] By adjusting the extrinsic parameters between the LiDAR and the semantic camera, the position of the target object identified by the LiDAR in the space where the target vehicle is located, as well as the distance between the target object and the target vehicle, can be projected relatively accurately onto the target object output by the semantic camera. Figure 12 This is a schematic diagram of semantic registration according to an embodiment of this application, such as... Figure 12 As shown, before registration, when the target objects identified by the LiDAR in the space where the target vehicle is located are projected onto the coordinate system of the semantic camera (with O as the origin, including the three projection axes x, y, and z), the target objects identified by the LiDAR (which may include, but are not limited to, the driving direction signs, streetlights, and traffic signs of the lane where the target vehicle is located) cannot overlap with the target objects output by the semantic camera (which are represented by solid lines) after projection (represented by dashed lines). Using the method in this embodiment, and with calibrated extrinsic parameters, the target objects identified by the LiDAR (which may include, but are not limited to, the driving direction signs, streetlights, and traffic signs of the lane where the target vehicle is located) can be made to substantially overlap with the target objects output by the semantic camera after projection, thus fully utilizing the point cloud sensed by the LiDAR.

[0095] Through the embodiments of this application, when calibrating the extrinsic parameters between the semantic camera and the LiDAR, on the one hand, the calibration board can be eliminated, saving manpower and resources; on the other hand, it can adapt to different types and low-line-count 3D LiDARs, and can better extract lane line information and traffic sign semantic information carried in the point cloud sensed by the LiDAR; different semantics are used to calculate different parameters, giving full play to the role of lane line information and TSR information identified by the LiDAR, and achieving better and more accurate acquisition of the extrinsic parameters between the LiDAR and the semantic camera.

[0096] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods of the various embodiments of this application.

[0097] Figure 13 This is a structural block diagram of a relative position determination device according to an embodiment of this application; as shown... Figure 13 As shown, it includes:

[0098] The first identification module 1302 is used to perform semantic recognition on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain the first lane line information and the first traffic sign recognition system (TSR) information. The first lane line information is used to identify the lane line identified by the lidar in the space where the target vehicle is located at the current time, and the first TSR information is used to identify the traffic sign object identified by the lidar in the space where the target vehicle is located at the current time.

[0099] The first acquisition module 1304 is used to acquire the second lane line information and the second TSR information output by the semantic camera on the target vehicle at the current time, wherein the second lane line information is used to identify the lane line identified by the semantic camera in the space where the target vehicle is located at the current time, and the second TSR information is used to identify the traffic sign object identified by the semantic camera in the space where the target vehicle is located at the current time.

[0100] The first determining module 1306 is used to determine the relative position of the target corresponding to the lidar and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information and the second TSR information, wherein the relative position of the target is used to represent the relative position between the target object identified by the lidar and the target object identified by the semantic camera, and the target object includes lane lines or traffic sign objects.

[0101] Through the above embodiments, the relative positions of identical lane lines or traffic sign objects identified by the LiDAR and the semantic camera can be determined based on the lane line information and TSR information identified by the LiDAR and the semantic camera. The point cloud scanned by the LiDAR can better reflect the position and distance of the identified objects, while the point cloud scanned by the semantic camera can better reflect the type of the identified objects. In this way, accurate matching of the position and distance of the objects scanned by the LiDAR with the objects captured by the semantic camera is achieved. This technical solution solves the problem of low matching accuracy between objects identified by the semantic camera and objects identified by the LiDAR in related technologies, achieving the technical effect of improving the matching accuracy between objects identified by the semantic camera and objects identified by the LiDAR.

[0102] In one exemplary embodiment, the first identification module includes:

[0103] A segmentation unit is used to divide each point in the first point cloud into a first candidate point cloud and a second candidate point cloud, wherein each point in the first candidate point cloud is located in the same plane as the target vehicle in the first coordinate system where the lidar is located, and each point in the second candidate point cloud is located in a different plane from the target vehicle in the first coordinate system.

[0104] The acquisition unit is used to acquire the first lane line information based on the first candidate point cloud, and to acquire the first TSR information based on the second candidate point cloud.

[0105] In one exemplary embodiment, the acquiring unit is configured to:

[0106] Obtain the reflection intensity of each point in the first candidate point cloud, wherein the first point cloud carries the reflection intensity of each point;

[0107] In the first candidate point cloud, points whose reflection intensity meets the preset matching condition between the reflection intensity and the preset reflection intensity threshold are determined to obtain the second point cloud;

[0108] By fitting the second point cloud, the information of the first lane line is obtained.

[0109] In one exemplary embodiment, the acquiring unit is further configured to:

[0110] When the second candidate point cloud includes N points, and N equals 2, taking one of the N points as the center, a first reference point is determined in the second candidate point cloud whose distance from the first reference point is less than or equal to the target radius, thus obtaining a third point cloud. The first reference point and the first reference point are used to identify target traffic sign objects of the target type. The third point cloud is then fitted to obtain the first TSR information; and / or

[0111] When the second candidate point cloud includes the N points and N is a positive integer greater than 2, the following steps are performed to obtain the TSR information corresponding to the i-th point, wherein the first TSR information includes the TSR information corresponding to each of the N points: determining a fourth point cloud containing the i-th point in the second candidate point cloud, wherein the distance between each point in the fourth point cloud and at least one point in the fourth point cloud is less than or equal to the target radius, and each point in the fourth point cloud is used to identify the target traffic sign object of the target type; fitting the fourth point cloud to obtain the TSR information corresponding to the i-th point, where i is a positive integer greater than or equal to 1 and less than or equal to N.

[0112] In one exemplary embodiment, the first determining module includes:

[0113] The first determining unit is configured to determine, based on the first lane line information and the second lane line information, the first preliminary relative position of the lidar and the semantic camera at the current time, wherein the first preliminary relative position is used to represent the preliminary relative position of the target object identified by the lidar and the target object identified by the semantic camera on a horizontal plane, and the horizontal plane is the plane where the semantic camera is located;

[0114] A first processing unit is configured to adjust the first preliminary relative position based on the first TSR information and the second TSR information to obtain the first relative position of the lidar and the semantic camera at the current time, and to determine the second relative position of the lidar and the semantic camera at the current time based on the first TSR information and the second TSR information. The target relative position includes the first relative position and the second relative position. The first relative position is used to represent the relative position of the target object identified by the lidar and the target object identified by the semantic camera on the horizontal plane, and the second relative position is used to represent the relative position of the target object identified by the lidar and the target object identified by the semantic camera on the vertical plane, wherein the vertical plane is a plane perpendicular to the horizontal plane.

[0115] In one exemplary embodiment, the first determining unit is configured to:

[0116] Starting from a preset first initial relative position, adjust the first current relative position until a first target condition is met, and determine the first current relative position when the first target condition is met as the first initial relative position;

[0117] The first target condition includes: the sum of the loss function values ​​of M points is less than or equal to a preset first target threshold. The M points are obtained by projecting the first lane line information onto the second coordinate system where the semantic camera is located according to the first current relative position, thus obtaining the M points of the lane line identified by the lidar in the space where the target vehicle is located.

[0118] The loss function values ​​for the M points are obtained through the following steps:

[0119] The first projection position of the j-th point in the second coordinate system and the first reference position of the j-th point in the second coordinate system are input into a preset first loss function to obtain the loss function value of the j-th point. The loss function value of the j-th point is used to represent the positional difference between the projection position of the j-th point in the second coordinate system and the reference position of the j-th point in the second coordinate system. M is a positive integer greater than or equal to 1, and j is a positive integer greater than or equal to 1 and less than or equal to M.

[0120] In one exemplary embodiment, the first processing unit is configured to:

[0121] Starting from the first preliminary relative position, adjust the first current relative position until the second target condition is met, and determine the first current relative position when the second target condition is met as the first relative position;

[0122] The second target condition includes: the sum of the loss function values ​​of Q points is less than or equal to the second target threshold, wherein the Q points are obtained by projecting the first TSR information onto the second coordinate system according to the first current relative position to obtain the Q points of traffic identification objects identified by the lidar in the space where the target vehicle is located;

[0123] The loss function values ​​at the Q points are obtained through the following steps:

[0124] The second projection position of the p-th point in the second coordinate system and the second reference position of the p-th point in the second coordinate system are input into a preset second loss function to obtain the loss function value of the p-th point. The loss function value of the p-th point is used to represent the positional difference between the projection position of the p-th point in the second coordinate system and the reference position of the p-th point in the second coordinate system. Q is a positive integer greater than or equal to 1, and p is a positive integer greater than or equal to 1 and less than or equal to Q.

[0125] In one exemplary embodiment, the first processing unit is configured to:

[0126] Starting from a preset second initial relative position, adjust the second current relative position until a third target condition is met, and determine the second current relative position when the third target condition is met as the second relative position;

[0127] The third target condition includes: the sum of the loss function values ​​of L points is less than or equal to the third target threshold, wherein the L points are obtained by projecting the first TSR information onto the second coordinate system according to the second current relative position to obtain the L points of traffic identification objects identified by the lidar in the space where the target vehicle is located;

[0128] The loss function values ​​for the L points are obtained through the following steps:

[0129] The second projection position of the qth point among the L points in the second coordinate system and the second reference position of the qth point in the second coordinate system are input into a preset third loss function to obtain the loss function value of the qth point. The loss function value of the qth point is used to represent the positional difference between the projection position of the qth point in the second coordinate system and the reference position of the qth point in the second coordinate system. L is a positive integer greater than or equal to 1, and q is a positive integer greater than or equal to 1 and less than or equal to L.

[0130] In one exemplary embodiment, the device further includes:

[0131] The second identification module is used to perform semantic recognition on the fifth point cloud sensed by the lidar on the target vehicle after determining the target relative position corresponding to the lidar and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information, to obtain fourth lane line information and fourth TSR information. The fourth lane line information is used to identify the lane line identified by the lidar in the space where the target vehicle is located at the next time after the current time, and the fourth TSR information is used to identify the traffic sign object identified by the lidar in the space where the target vehicle is located at the next time.

[0132] The second acquisition module is used to acquire the fifth lane line information and the fifth TSR information output by the semantic camera on the target vehicle, wherein the fifth lane line information is used to identify the lane line identified by the semantic camera in the next time in the space where the target vehicle is located, and the fifth TSR information is used to identify the traffic sign object identified by the semantic camera in the next time in the space where the target vehicle is located.

[0133] The second determining module is used to determine the next relative position of the lidar and the semantic camera at the next time based on the fourth lane line information, the fourth TSR information, the fifth lane line information, the fifth TSR information, and the first TSR information.

[0134] In one exemplary embodiment, the second determining module includes:

[0135] The union unit is used to perform a union operation on the points of traffic-identified objects in the first TSR information and the points of traffic-identified objects in the fourth TSR information to obtain the sixth TSR information.

[0136] The second determining unit is used to determine the second preliminary relative position of the lidar and the semantic camera in the next time step based on the fourth lane line information and the fifth lane line information.

[0137] The second processing unit is configured to adjust the second preliminary relative position according to the fifth TSR information and the sixth TSR information to obtain the third relative position corresponding to the lidar and the semantic camera in the next time step, and determine the fourth relative position corresponding to the lidar and the semantic camera in the next time step according to the fifth TSR information and the sixth TSR information, wherein the next relative position includes the third relative position and the fourth relative position.

[0138] Embodiments of this application also provide a storage medium including a stored program, wherein the program executes any of the methods described above when it is run.

[0139] Optionally, in this embodiment, the storage medium may be configured to store program code for performing the following steps:

[0140] S11, semantic recognition is performed on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain the first lane line information and the first traffic sign recognition system (TSR) information. The first lane line information is used to identify the lane line identified by the lidar in the space where the target vehicle is located at the current time, and the first TSR information is used to identify the traffic sign object identified by the lidar in the space where the target vehicle is located at the current time.

[0141] S12, acquire the second lane line information and the second TSR information output by the semantic camera on the target vehicle at the current time, wherein the second lane line information is used to identify the lane line identified by the semantic camera in the space where the target vehicle is located at the current time, and the second TSR information is used to identify the traffic sign object identified by the semantic camera in the space where the target vehicle is located at the current time.

[0142] S13, based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information, determine the target relative position corresponding to the lidar and the semantic camera at the current time, wherein the target relative position is used to represent the relative position between the target object identified by the lidar and the target object identified by the semantic camera, and the target object includes lane lines or traffic sign objects.

[0143] Optionally, in this embodiment, the storage medium may also be configured to store program code for performing the following steps:

[0144] S21, semantic recognition is performed on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain the first lane line information and the first traffic sign recognition system (TSR) information. The first lane line information is used to identify the lane line identified by the lidar in the space where the target vehicle is located at the current time, and the first TSR information is used to identify the traffic sign object identified by the lidar in the space where the target vehicle is located at the current time.

[0145] S22, acquire the second lane line information and the second TSR information output by the semantic camera on the target vehicle at the current time, wherein the second lane line information is used to identify the lane line identified by the semantic camera in the space where the target vehicle is located at the current time, and the second TSR information is used to identify the traffic sign object identified by the semantic camera in the space where the target vehicle is located at the current time;

[0146] S23, based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information, determine the target relative position corresponding to the lidar and the semantic camera at the current time, wherein the target relative position is used to represent the relative position between the target object identified by the lidar and the target object identified by the semantic camera, and the target object includes lane lines or traffic sign objects.

[0147] Embodiments of this application also provide an electronic device including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.

[0148] Optionally, the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor and the input / output device is connected to the processor.

[0149] Optionally, in this embodiment, the storage medium may include, but is not limited to, various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0150] Optionally, specific examples in this embodiment can refer to the examples described in the above embodiments and optional implementations, and will not be repeated here.

[0151] Obviously, those skilled in the art should understand that the modules or steps of this application described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. Optionally, they can be implemented using computer-executable program code, thereby storing them in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, this application is not limited to any particular combination of hardware and software.

[0152] The above description is only a preferred embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.

Claims

1. A method for determining relative position, characterized in that, include: Semantic recognition is performed on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain first lane line information and first TSR information. The first lane line information is used to identify the lane line identified by the lidar in the space where the target vehicle is located at the current time, and the first TSR information is used to identify the traffic sign object identified by the lidar in the space where the target vehicle is located at the current time. The second lane line information and the second TSR information output by the semantic camera on the target vehicle at the current time are obtained, wherein the second lane line information is used to identify the lane line identified by the semantic camera in the space where the target vehicle is located at the current time, and the second TSR information is used to identify the traffic sign object identified by the semantic camera in the space where the target vehicle is located at the current time. Based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information, the relative position of the target corresponding to the lidar and the semantic camera at the current time is determined, wherein the relative position of the target is used to represent the relative position between the target object identified by the lidar and the target object identified by the semantic camera, and the target object includes lane lines or traffic sign objects; The step of determining the target relative position of the LiDAR and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information includes: determining a first preliminary relative position of the LiDAR and the semantic camera at the current time based on the first lane line information and the second lane line information, wherein the first preliminary relative position is used to represent the preliminary relative position of the target object identified by the LiDAR and the target object identified by the semantic camera on a horizontal plane, and the horizontal plane is the plane where the semantic camera is located; and adjusting the first preliminary relative position based on the first TSR information and the second TSR information. The settings are adjusted to obtain the first relative position of the LiDAR and the semantic camera at the current time, and the second relative position of the LiDAR and the semantic camera at the current time is determined based on the first TSR information and the second TSR information. The target relative position includes the first relative position and the second relative position. The first relative position is used to represent the relative position of the target object identified by the LiDAR and the target object identified by the semantic camera on the horizontal plane. The second relative position is used to represent the relative position of the target object identified by the LiDAR and the target object identified by the semantic camera on the vertical plane. The vertical plane is a plane perpendicular to the horizontal plane.

2. The method according to claim 1, characterized in that, The step of performing semantic recognition on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain the first lane line information and the first TSR information includes: The points in the first point cloud are divided into a first candidate point cloud and a second candidate point cloud. The points in the first candidate point cloud are located in the same plane as the target vehicle in the first coordinate system where the lidar is located, while the points in the second candidate point cloud are located in a different plane from the target vehicle in the first coordinate system. The first lane line information is obtained based on the first candidate point cloud, and the first TSR information is obtained based on the second candidate point cloud.

3. The method according to claim 2, characterized in that, The step of obtaining the first lane line information based on the first candidate point cloud includes: Obtain the reflection intensity of each point in the first candidate point cloud, wherein the first point cloud carries the reflection intensity of each point; In the first candidate point cloud, points whose reflection intensity meets the preset matching condition between the reflection intensity and the preset reflection intensity threshold are determined to obtain the second point cloud; By fitting the second point cloud, the information of the first lane line is obtained.

4. The method according to claim 2, characterized in that, The step of obtaining the first TSR information based on the second candidate point cloud includes: When the second candidate point cloud includes N points, and N equals 2, taking one of the N points as the center, a first reference point is determined in the second candidate point cloud whose distance from the first reference point is less than or equal to the target radius, thus obtaining a third point cloud. The first reference point and the first reference point are used to identify target traffic sign objects of the target type. The third point cloud is then fitted to obtain the first TSR information; and / or When the second candidate point cloud includes the N points and N is a positive integer greater than 2, the following steps are performed to obtain the TSR information corresponding to the i-th point, wherein the first TSR information includes the TSR information corresponding to each of the N points: determining a fourth point cloud containing the i-th point in the second candidate point cloud, wherein the distance between each point in the fourth point cloud and at least one point in the fourth point cloud is less than or equal to the target radius, and each point in the fourth point cloud is used to identify the target traffic sign object of the target type; fitting the fourth point cloud to obtain the TSR information corresponding to the i-th point, where i is a positive integer greater than or equal to 1 and less than or equal to N.

5. The method according to claim 1, characterized in that, Determining the first preliminary relative position of the lidar and the semantic camera at the current time based on the first lane line information and the second lane line information includes: Starting from a preset first initial relative position, adjust the first current relative position until a first target condition is met, and determine the first current relative position when the first target condition is met as the first initial relative position; The first target condition includes: the sum of the loss function values ​​of M points is less than or equal to a preset first target threshold. The M points are obtained by projecting the first lane line information onto the second coordinate system where the semantic camera is located according to the first current relative position, thus obtaining the M points of the lane line identified by the lidar in the space where the target vehicle is located. The loss function values ​​for the M points are obtained through the following steps: The first projection position of the j-th point in the second coordinate system and the first reference position of the j-th point in the second coordinate system are input into a preset first loss function to obtain the loss function value of the j-th point. The loss function value of the j-th point is used to represent the positional difference between the projection position of the j-th point in the second coordinate system and the reference position of the j-th point in the second coordinate system. M is a positive integer greater than or equal to 1, and j is a positive integer greater than or equal to 1 and less than or equal to M.

6. The method according to claim 5, characterized in that, The step of adjusting the first preliminary relative position based on the first TSR information and the second TSR information to obtain the first relative position corresponding to the lidar and the semantic camera at the current time includes: Starting from the first preliminary relative position, adjust the first current relative position until the second target condition is met, and determine the first current relative position when the second target condition is met as the first relative position; The second target condition includes: the sum of the loss function values ​​of Q points is less than or equal to the second target threshold, wherein the Q points are obtained by projecting the first TSR information onto the second coordinate system according to the first current relative position to obtain the Q points of traffic identification objects identified by the lidar in the space where the target vehicle is located; The loss function values ​​at the Q points are obtained through the following steps: The second projection position of the p-th point in the second coordinate system and the second reference position of the p-th point in the second coordinate system are input into a preset second loss function to obtain the loss function value of the p-th point. The loss function value of the p-th point is used to represent the positional difference between the projection position of the p-th point in the second coordinate system and the reference position of the p-th point in the second coordinate system. Q is a positive integer greater than or equal to 1, and p is a positive integer greater than or equal to 1 and less than or equal to Q.

7. The method according to claim 5, characterized in that, Determining the second relative position of the lidar and the semantic camera at the current time based on the first TSR information and the second TSR information includes: Starting from a preset second initial relative position, adjust the second current relative position until a third target condition is met, and determine the second current relative position when the third target condition is met as the second relative position; The third target condition includes: the sum of the loss function values ​​of L points is less than or equal to the third target threshold, wherein the L points are obtained by projecting the first TSR information onto the second coordinate system according to the second current relative position to obtain the L points of traffic identification objects identified by the lidar in the space where the target vehicle is located; The loss function values ​​for the L points are obtained through the following steps: The second projection position of the qth point among the L points in the second coordinate system and the second reference position of the qth point in the second coordinate system are input into a preset third loss function to obtain the loss function value of the qth point. The loss function value of the qth point is used to represent the positional difference between the projection position of the qth point in the second coordinate system and the reference position of the qth point in the second coordinate system. L is a positive integer greater than or equal to 1, and q is a positive integer greater than or equal to 1 and less than or equal to L.

8. The method according to claim 1, characterized in that, After determining the relative positions of the target corresponding to the lidar and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information, and the second TSR information, the method further includes: Semantic recognition is performed on the fifth point cloud sensed by the lidar on the target vehicle to obtain fourth lane line information and fourth TSR information. The fourth lane line information is used to identify the lane line identified by the lidar in the space where the target vehicle is located in the next time after the current time, and the fourth TSR information is used to identify the traffic sign object identified by the lidar in the space where the target vehicle is located in the next time. The fifth lane line information and the fifth TSR information output by the semantic camera on the target vehicle are obtained, wherein the fifth lane line information is used to identify the lane line identified by the semantic camera in the space where the target vehicle is located in the next time, and the fifth TSR information is used to identify the traffic sign object identified by the semantic camera in the space where the target vehicle is located in the next time. Based on the fourth lane line information, the fourth TSR information, the fifth lane line information, the fifth TSR information, and the first TSR information, the next relative position of the lidar and the semantic camera at the next time is determined.

9. The method according to claim 8, characterized in that, The step of determining the next relative position of the lidar and the semantic camera at the next time step based on the fourth lane line information, the fourth TSR information, the fifth lane line information, the fifth TSR information, and the first TSR information includes: The sixth TSR information is obtained by taking the union of the traffic identification object points in the first TSR information and the traffic identification object points in the fourth TSR information. Based on the fourth lane line information and the fifth lane line information, determine the second preliminary relative position of the lidar and the semantic camera at the next time step; Based on the fifth TSR information and the sixth TSR information, the second preliminary relative position is adjusted to obtain the third relative position corresponding to the lidar and the semantic camera in the next time step. Based on the fifth TSR information and the sixth TSR information, the fourth relative position corresponding to the lidar and the semantic camera in the next time step is determined. The next relative position includes the third relative position and the fourth relative position.

10. A device for determining relative position, characterized in that, include: The first identification module is used to perform semantic recognition on the first point cloud sensed by the lidar on the target vehicle at the current time to obtain the first lane line information and the first TSR information. The first lane line information is used to identify the lane line identified by the lidar in the space where the target vehicle is located at the current time, and the first TSR information is used to identify the traffic sign object identified by the lidar in the space where the target vehicle is located at the current time. The first acquisition module is used to acquire the second lane line information and the second TSR information output by the semantic camera on the target vehicle at the current time, wherein the second lane line information is used to identify the lane line recognized by the semantic camera in the space where the target vehicle is located at the current time, and the second TSR information is used to identify the traffic sign object recognized by the semantic camera in the space where the target vehicle is located at the current time. The first determining module is used to determine the relative position of the target corresponding to the lidar and the semantic camera at the current time based on the first lane line information, the first TSR information, the second lane line information and the second TSR information, wherein the relative position of the target is used to represent the relative position between the target object identified by the lidar and the target object identified by the semantic camera, and the target object includes lane lines or traffic sign objects; The first determining module includes: a first determining unit, configured to determine a first preliminary relative position of the lidar and the semantic camera at the current time based on the first lane line information and the second lane line information, wherein the first preliminary relative position represents the preliminary relative position of the target object identified by the lidar and the target object identified by the semantic camera on a horizontal plane, the horizontal plane being the plane where the semantic camera is located; and a first processing unit, configured to adjust the first preliminary relative position based on the first TSR information and the second TSR information to obtain the first relative position of the lidar and the semantic camera at the current time, and to determine a second relative position of the lidar and the semantic camera at the current time based on the first TSR information and the second TSR information, wherein the target relative position includes the first relative position and the second relative position, the first relative position represents the relative position of the target object identified by the lidar and the target object identified by the semantic camera on a horizontal plane, and the second relative position represents the relative position of the target object identified by the lidar and the target object identified by the semantic camera on a vertical plane, the vertical plane being a plane perpendicular to the horizontal plane.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein the program, when executed, performs the method of any one of claims 1 to 9.

12. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to execute the method of any one of claims 1 to 9 through the computer program.