Collaborative detection and target identification method based on inherent sensors of robots

By combining lidar, IMU measurement modules, and payload detection equipment in a collaborative detection method, the problem of the small perception range of visual sensors for special robots was solved, enabling long-distance target recognition and high-precision semantic map construction, while reducing costs.

CN117607894BActive Publication Date: 2026-06-23ZHONGBING INTELLIGENT INNOVATION RES INST CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHONGBING INTELLIGENT INNOVATION RES INST CO LTD
Filing Date
2023-12-04
Publication Date
2026-06-23

Smart Images

  • Figure CN117607894B_ABST
    Figure CN117607894B_ABST
Patent Text Reader

Abstract

The application relates to a kind of collaborative detection and target identification methods based on robot inherent sensor. It includes: point cloud pretreatment is carried out based on laser radar point cloud data to obtain pretreated point cloud information;Real-time positioning information of the robot is obtained using laser SLAM based on the pretreated point cloud information and IMU measurement data and a map is constructed;The target type and target position are obtained by using the load detection device to identify the point cloud information based on the pretreated point cloud information and the real-time positioning information of the robot;Semantic map is obtained by fusing based on the map, the target type and the target position. High accuracy, long-distance detection results of using robot inherent perception load are realized, without increasing visual equipment, which can assist the robot to realize the identification and labeling of target in real-time positioning and mapping process, and can be used for map construction of special robot in underground space or dangerous operation area.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of target cooperative detection and identification technology, and in particular to a cooperative detection and target identification method based on robot's inherent sensors. Background Technology

[0002] In recent years, with the rise of artificial intelligence, autonomous navigation of robots has become a research hotspot. Simultaneous localization and mapping (SLAM) is one of the core technologies for achieving autonomous navigation, and a common method is to use LiDAR on the robot's chassis. LiDAR has advantages such as high accuracy, large ranging range, and immunity to light conditions, bringing tremendous development opportunities to LiDAR SLAM. However, maps constructed by LiDAR SLAM lack semantic information, which enables robots to better understand their environment and provides information for navigation. Semantic information is usually obtained from image information captured by cameras; LiDAR point clouds cannot provide semantic information. Therefore, how to combine LiDAR and visual camera information to obtain semantic maps has become a key research focus.

[0003] Current common methods for addressing the aforementioned problems involve adding visual sensors to the robot chassis. This enhances the semantic results of target recognition in real-time localization and semantic map construction based on LiDAR, fusing rich visual images and LiDAR SLAM results onto a LiDAR point cloud map through the conversion relationship between vision and LiDAR. However, for specialized robots, these common methods still have the following shortcomings:

[0004] 1. The maximum sensing range of a typical vision sensor is 100 meters, and the field of view is 110 degrees. It cannot identify distant targets and is insufficient to meet the needs of special robots.

[0005] 2. To achieve target recognition, tracking, and engagement, special-purpose robots are often equipped with high-performance visual sensing payloads, and the commonly used method for target recognition using vision is learning-based. For special-purpose robots with visual sensing payloads, adding cameras or other visual sensors to the vehicle chassis for target recognition not only fails to fully utilize the visual information in the existing sensing payloads but also increases computational demands, leading to increased costs. Summary of the Invention

[0006] Based on the above analysis, the embodiments of the present invention aim to provide a collaborative detection and target identification method based on the robot's inherent sensors, in order to solve the problem that existing visual sensors have a small sensing range and that adding additional visual sensors to special robots that are already equipped with visual sensing payloads would increase computing power.

[0007] The objective of this invention is mainly achieved through the following technical solutions:

[0008] This invention provides a cooperative detection and target identification method based on robot's inherent sensors. The sensors include a lidar, an IMU measurement module, and a payload detection device mounted on the robot. The method includes the following steps:

[0009] Preprocessing of point cloud data based on lidar point cloud data and IMU measurement data yields preprocessed point cloud information.

[0010] Based on the preprocessed point cloud information, laser SLAM is used to obtain the robot's real-time positioning information and construct a map;

[0011] Based on the preprocessed point cloud information and the robot's real-time positioning information, the load detection device is used to identify the point cloud information to obtain the target type and target location;

[0012] A semantic map is obtained by fusing the map, the target type, and the target location.

[0013] Furthermore, the load detection device includes a laser rangefinder, a sensing camera, a shifting mechanism, and a processor. By adjusting the attitude of the shifting mechanism and the focal length of the sensing camera, the target in the designated area is clearly photographed and the target type is identified by the processor. The laser rangefinder is used to accurately measure the target position in the designated area.

[0014] Furthermore, the step of using the payload detection device to identify the target type and target location based on the preprocessed point cloud information and the robot's positioning information includes:

[0015] Based on the preprocessed point cloud information, the load detection device automatically adjusts the posture of the rotation mechanism and the focal length of the sensing camera to obtain a clear image of the point cloud information. The laser rangefinder is used to measure the distance between the robot and the location of the point cloud to obtain the rotation matrix of the target and the initial posture of the sensing load detection device.

[0016] Based on the clear image, the target type of the image is obtained by using the target recognition model in the processor to identify the target.

[0017] Furthermore, the rotation matrix between the target and the initial pose of the sensing payload detection device is represented as:

[0018]

[0019] in, Let be the rotation matrix between the target and the initial pose of the sensing payload detection device; d is the attitude rotation matrix of the rotation mechanism; 0 The distance between the robot and the location of the point cloud.

[0020] Furthermore, the step of using a target recognition model to perform target recognition based on the clear image to obtain the target type of the image includes:

[0021] The clear image is preprocessed to adapt to the input format of the target recognition model;

[0022] The preprocessed image is input into a trained target recognition model to obtain the target detection result in the image; wherein, the target recognition model is a model using a YOLOv5 neural network.

[0023] Furthermore, the step of fusing the map information, the target type, and the target location to obtain a semantic map includes:

[0024] When the load detection device identifies target information, the load detection device and the lidar are jointly calibrated.

[0025] Based on the calibration results, the lidar point cloud is projected onto the image of the payload detection device, and the lidar point cloud that is projected to the target location is selected from the lidar point cloud.

[0026] The target type is updated on the map of the lidar point cloud to obtain a semantic map.

[0027] Furthermore, the joint calibration of the payload detection device and the lidar includes:

[0028] The intrinsic parameters of the sensing camera of the payload detection device are calibrated to obtain the intrinsic parameter matrix of the sensing camera:

[0029]

[0030] Where (u,v) are image coordinates; (x,y,z) are the coordinates of the payload camera coordinate system; f / dx is the length of the focal length in the x-axis direction described in pixels; f / dy is the length of the focal length in the y-axis direction described in pixels; f is the function relationship of the camera with respect to the pixel coordinates; and (u0,v0) is the pixel offset.

[0031] Establish the extrinsic parameters of the sensing camera and the lidar:

[0032]

[0033] Where R is the direction of the coordinate axis of the lidar coordinate system relative to the coordinate axis of the payload camera; t is the position of the origin of the lidar coordinate system in the payload camera coordinate system.

[0034] Furthermore, based on the calibration results, the lidar point cloud is projected onto the image of the payload detection device using the following formula:

[0035]

[0036] Among them, (X) L ,Y L Z L (u,v) represents the point cloud coordinates in the lidar coordinate system; (u,v) represents the pixel coordinates projected onto the imaging plane of the sensing camera. Let be the rotation matrix between the target and the initial pose of the sensing payload detection device.

[0037] Furthermore, selecting the lidar point cloud whose projection is located at the target position from the lidar point cloud includes:

[0038] The projected lidar point cloud is filtered and downsampled.

[0039] The downsampled lidar point cloud is clustered, and the class with the most point clouds is selected as the target point cloud.

[0040] Furthermore, the point cloud preprocessing includes:

[0041] Discrete radar points are removed from the lidar point cloud data, and filtering and downsampling are performed.

[0042] After performing point cloud clustering on the filtered point cloud, IMU pre-integration estimation is used to complete the point cloud distortion removal, resulting in processed point cloud information.

[0043] Compared with the prior art, the present invention can achieve at least one of the following beneficial effects:

[0044] 1. This invention utilizes the wide field of view and high-frequency scanning characteristics of lidar to provide initial detection orientation for the robot's visual perception payload. By leveraging the high accuracy and long-distance detection capabilities of the already mounted visual perception payload, the robot can identify and label distant targets during the mapping process.

[0045] 2. This invention utilizes a high-performance visual perception payload that is self-contained in a special robot, enabling high-precision target recognition without the need for additional visual sensors, thereby reducing the robot's load and saving costs.

[0046] In this invention, the above-described technical solutions can be combined with each other to achieve more preferred combinations. Other features and advantages of this invention will be set forth in the following description, and some advantages may become apparent from the description or be learned by practicing the invention. The objects and other advantages of this invention can be realized and obtained from what is particularly pointed out in the description and drawings. Attached Figure Description

[0047] The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Throughout the drawings, the same reference numerals denote the same parts.

[0048] Figure 1 This is a flowchart illustrating the cooperative detection and target identification method based on the robot's inherent sensors in an embodiment of the present invention.

[0049] Figure 2 This is a block diagram of a target collaborative reconnaissance and map target identification system combining laser SLAM and sensing payload in an embodiment of the present invention. Detailed Implementation

[0050] Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form part of this application and are used together with the embodiments of the present invention to illustrate the principles of the present invention, but are not intended to limit the scope of the present invention.

[0051] One specific embodiment of the present invention discloses a cooperative detection and target identification method based on robot's inherent sensors, wherein the sensors include a lidar, an IMU measurement module, and a payload detection device mounted on the robot; as shown Figure 1 As shown, it includes the following steps:

[0052] Step S1: Perform point cloud preprocessing based on lidar point cloud data and IMU measurement data to obtain preprocessed point cloud information;

[0053] Specifically, the lidar point cloud is a data set composed of a series of three-dimensional coordinate points acquired by the lidar sensor. Each point represents a spatial location, and can be located in three-dimensional space using coordinate values. The lidar sensor periodically emits a laser beam and measures the time it takes for the laser beam to return to calculate the distance between the target object and the lidar. By using a rotation or scanning mechanism, point cloud data of the target object at different angles can be acquired.

[0054] Furthermore, the main purpose of point cloud preprocessing is to facilitate subsequent point cloud processing and algorithms.

[0055] Specifically, the point cloud preprocessing includes the following steps:

[0056] Step S101: Remove discrete radar points from the lidar point cloud data and perform filtering and downsampling;

[0057] Specifically, the lidar point cloud is subjected to noise reduction processing using filtering methods to reduce the impact of noise; the lidar point cloud is downsampled to reduce the number of data points and speed up subsequent processing.

[0058] Preferably, in this embodiment, the radius filtering method is used to remove discrete radar points, and voxel filtering is used to sample the radar points in the cloud.

[0059] Step S102: After performing point cloud clustering on the filtered point cloud data, IMU pre-integration estimation is used to complete point cloud distortion removal, and the processed point cloud information is obtained.

[0060] Specifically, point cloud clustering aggregates objects in the environment, integrating them together. In multi-sensor fusion systems, due to the high measurement frequency of IMUs, the acquired data contains errors. For each IMU data acquisition moment, it needs to be integrated to obtain attitude and velocity information, which is then combined with LiDAR data for optimization to obtain more accurate pose estimation. However, the integration process of IMU data introduces integration errors and consumes significant computational resources. To address these issues, IMU pre-integration technology is used to store the velocity and displacement increments of the IMU as constants within each time interval, instead of re-integrating each time. During LiDAR SLAM optimization, the results of IMU pre-integration are added as constant constraints to the optimization problem. This way, only the state variables need to be updated in each optimization iteration, while the pre-integration results remain unchanged. This reduces the workload of re-integration each time, improves optimization speed, and enables motion compensation of point clouds.

[0061] Preferably, in this embodiment, KD-Tree is used to perform nearest neighbor search on the filtered point cloud to find the nearest neighbor points, and point cloud clustering is performed by calculating the Euclidean distance of the point cloud.

[0062] Step S2: Based on the preprocessed point cloud information, use laser SLAM to obtain the robot's real-time positioning information and construct a map;

[0063] Specifically, feature points are extracted using the curvature of the preprocessed point cloud information, and the preprocessed point cloud is divided into edge points and planar points to determine whether the objects in the point clouds of two consecutive frames are the same object.

[0064] Furthermore, the curvature is calculated using the following formula by selecting the distance values ​​(distances from the point to the lidar) of the current point cloud and the five points before and after it:

[0065]

[0066] Where c is the curvature of the current point cloud; S is the point cloud set consisting of the current point cloud i (i.e., the laser point i in the kth frame) and the five adjacent points before and after it; {L} is the lidar coordinates; Let i be the coordinates of the laser point i in the k-th frame under {L}; Let represent the coordinates of laser point j, which is a neighbor of laser point i in the k-th frame under {L}.

[0067] Furthermore, after calculating the curvature of all points in a frame, each point is sorted in ascending order based on its curvature, and the calculated curvature is compared with a curvature threshold. When the curvature c is less than the threshold, the LiDAR point is considered a planar feature point; when the curvature c is greater than the threshold, the LiDAR point is considered an edge feature point. Preferably, the curvature threshold is set to 0.1.

[0068] It should be noted that if there are already planar feature points among the five points before and after the selected LiDAR point, the selected LiDAR point is skipped, and a planar feature point is selected from the points with smaller curvature; if there are already edge feature points among the five points before and after the selected LiDAR point, the selected LiDAR point is skipped, and an edge feature point is selected from the points with larger curvature.

[0069] Furthermore, based on the feature points extracted from the lidar point cloud, a scan-to-scan method is used to achieve feature matching between frames.

[0070] Specifically, the point cloud detected by the lidar in the k-th scan is p. k The extracted set of edge feature points is E k The extracted set of planar feature points is denoted as H. k The point cloud detected in the (k+1)th scan is p. k+1 The extracted set of edge feature points is E k+1 The extracted set of planar feature points is denoted as H. k+1 .

[0071] Furthermore, for the set of edge feature points E k With E k+1 Use the following steps to associate:

[0072] In E k+1 Select a feature point i in E, and use a kd-tree to... k Find the point j closest to i, and at the same time find the point l closest to j on the adjacent scan line. At this time, the three points i, j and l are collinear and form a straight line. The data of the edge feature points are associated by constructing the distance residual from point i to the line in frame k+1.

[0073] Specifically, the residual function for edge point and line features is:

[0074]

[0075] in, X represents the coordinates of feature point i; (k,j) X represents the coordinates of feature point j; (k,l) Let l be the coordinates of the feature point.

[0076] Furthermore, for the set H of planar feature points k With H k+1 Use the following steps to associate:

[0077] In H k+1 Select a feature point from the data and use a kd-tree to analyze it in the H array. k Find the nearest point j of i, and find the second nearest point l on the same scan line. At the same time, find the nearest point m of j on the adjacent scan line. Points j, l and m form a plane. The data of the feature points of the plane are associated by constructing the distance residual from point i to the plane in frame k+1.

[0078] Specifically, the residual function for planar point-surface features is:

[0079]

[0080] in, X represents the coordinates of feature point i; (k,j) X represents the coordinates of feature point j; (k,l) X represents the coordinates of feature point l; (k,m) Let m be the coordinates of the feature point.

[0081] Furthermore, the residual functions of the line features of the edge feature points and the residual functions of the surface features of the planar feature points are optimized using the Jacobian matrix of the transformation matrix to obtain accurate robot localization results.

[0082] Furthermore, a three-dimensional point cloud map is constructed based on the laser point cloud data using mapping algorithms in laser SLAM.

[0083] Step S3: Based on the preprocessed point cloud information and the robot's real-time positioning information, use the load detection device to identify the point cloud information to obtain the target type and target location;

[0084] Specifically, such as Figure 2 As shown, the load detection device includes a laser rangefinder, a sensing camera, a shifting mechanism, and a processor. By adjusting the attitude of the shifting mechanism and the focal length of the sensing camera, the target in the designated area is clearly photographed and the target type is identified by the processor. The laser rangefinder is used to accurately measure the target position in the designated area.

[0085] Furthermore, the preprocessed point cloud information is sent to the load detection device as the target coarse location region information; after receiving the target coarse location region information, the load detection device automatically adjusts the attitude of the shifting mechanism and the focal length of the sensing camera to obtain a clear image of the point cloud information, and records the rotation matrix of the shifting mechanism at this time. and the focal length f of the sensing camera 0 The distance d between the robot and the location of the point cloud is measured using the laser rangefinder. 0 The rotation matrix between the target and the initial pose of the sensing payload detection device is obtained.

[0086]

[0087] Furthermore, based on the clear image, target recognition is performed using the target recognition model in the processor to obtain the target type of the image, including the following steps:

[0088] The clear image is preprocessed to adapt to the input format of the target recognition model; wherein, the preprocessing includes scaling and normalizing the input image.

[0089] The preprocessed image is input into a trained target recognition model to obtain the target detection result in the image; wherein, the target recognition model is a model using a YOLOv5 neural network.

[0090] Furthermore, the target detection results in the image obtained by the target recognition model are post-processed to obtain the final target detection result; wherein, the post-processing includes: non-maximum suppression and confidence screening.

[0091] It should be noted that the nonmaximum suppression is used to remove redundant detection boxes from the output of the YOLOv5 neural network model.

[0092] The confidence level refers to the degree of certainty by which the algorithm believes a target exists within the detection box. The confidence level formula is used to calculate the probability of a target existing within the detection box. Using the confidence level formula, we can determine the accuracy of the algorithm's detection results for different detection boxes, and thus select the results with high confidence levels as the target detection results.

[0093] Step S4: Obtain a semantic map by fusing the map, the target type, and the target location.

[0094] Specifically, when constructing a point cloud map, the laser SLAM checks whether the payload detection device has identified target information; when the payload detection device identifies target information, it performs joint calibration of the payload detection device and the lidar, including the following steps:

[0095] The intrinsic parameters of the sensing camera of the payload detection device are calibrated to obtain the intrinsic parameter matrix of the sensing camera:

[0096]

[0097] Where (u,v) are image coordinates; (x,y,z) are the coordinates of the payload camera coordinate system; f / dx is the length of the focal length in the x-axis direction described in pixels; f / dy is the length of the focal length in the y-axis direction described in pixels; f is the function relationship of the camera with respect to the pixel coordinates; and (u0,v0) is the pixel offset.

[0098] After determining the intrinsic parameters of the payload camera, a calibration board was used to calibrate the sensing payload and the LiDAR using Autoware software, thus establishing the extrinsic parameters of the camera and the LiDAR.

[0099]

[0100] Where R is the direction of the coordinate axis of the lidar coordinate system relative to the coordinate axis of the payload camera, and is a 3*3 rotation matrix; t is the position of the origin of the lidar coordinate system in the payload camera coordinate system, and is a 3*1 translation vector.

[0101] Furthermore, based on the calibration results, the lidar point cloud is projected onto the image of the payload detection device, and the lidar point cloud that is projected to the target location is selected from the lidar point cloud.

[0102] Specifically, the projection process involves transforming the lidar point cloud into the coordinate system of the payload detection device based on the calibration parameter matrix of the payload detection device and the lidar, thereby obtaining the point cloud coordinates in the coordinate system of the payload detection device; and projecting the lidar point cloud onto the imaging plane of the sensing camera based on the intrinsic parameters of the payload detection device.

[0103] Furthermore, the projection process formula is as follows:

[0104]

[0105] Among them, (X) L ,Y L Z L (u,v) represents the point cloud coordinates in the lidar coordinate system; (u,v) represents the pixel coordinates projected onto the imaging plane of the sensing camera. Let be the rotation matrix between the target and the initial pose of the sensing payload detection device.

[0106] Furthermore, selecting the lidar point cloud whose projection is located at the target position from the lidar point cloud includes:

[0107] The projected LiDAR point cloud is downsampled using voxel filtering. By controlling the size of the voxels, the number of point clouds can be effectively reduced while maintaining the point cloud contour features.

[0108] The downsampled lidar point cloud is subjected to Euclidean clustering, and a KD-Tree-based nearest neighbor query algorithm is used to preprocess the data to accelerate the Euclidean clustering.

[0109] The class with the most point clouds is selected as the type of the target point cloud.

[0110] Furthermore, by combining the installation relationship between the LiDAR and the robot platform and the target point cloud information, the relative positional relationship between the target and the robot is obtained, and the target type is updated on the LiDAR point cloud map to obtain a semantic map.

[0111] In summary, the collaborative detection and target identification method based on the robot's inherent sensors, as presented in this embodiment of the invention, addresses the challenge of identifying targets on a map using only LiDAR during real-time localization and mapping. This is because the discreteness and sparsity of laser point clouds and the limited sensing distance of remote-controlled cameras hinder the effective identification of targets on the map. By leveraging the wide field of view and high-frequency scanning capabilities of LiDAR, the method provides the robot's visual perception payload with rough target location information. The high accuracy and long-range detection results of the visual perception payload assist the special robot in identifying and labeling targets during real-time localization and mapping. Furthermore, this invention utilizes the special robot's self-contained high-performance visual perception payload, enabling high-precision target identification without the need for additional visual sensors. This method can be used for map construction by special robots in underground spaces or hazardous work areas.

[0112] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.

Claims

1. A cooperative detection and target identification method based on robot's inherent sensors, characterized in that, The sensors include a lidar, an IMU measurement module, and a payload detection device mounted on the robot, and the method includes the following steps: Preprocessing of point cloud data based on lidar point cloud data and IMU measurement data yields preprocessed point cloud information. Based on the preprocessed point cloud information, laser SLAM is used to obtain the robot's real-time positioning information and construct a map; Based on the preprocessed point cloud information and the robot's real-time positioning information, the load detection device is used to identify the point cloud information to obtain the target type and target location; A semantic map is obtained by fusing the map, the target type, and the target location, including: When the load detection device identifies target information, the load detection device and the lidar are jointly calibrated. Based on the calibration results, the lidar point cloud is projected onto the image of the payload detection device, and the lidar point cloud that is projected to the target location is selected from the lidar point cloud. The target type is updated on the LiDAR point cloud map to obtain a semantic map; The joint calibration of the load detection device and the lidar includes: The intrinsic parameters of the sensing camera of the payload detection device are calibrated to obtain the intrinsic parameter matrix of the sensing camera: Where (u,v) are image coordinates; (x,y,z) are the coordinates of the payload camera coordinate system; f / dx is the length of the focal length in the x-axis direction described in pixels; f / dy is the length of the focal length in the y-axis direction described in pixels; and f is a function relating the pixel coordinates in the camera. This is the pixel offset; Establish the extrinsic parameters of the sensing camera and the lidar: Where R is the direction of the coordinate axis of the lidar coordinate system relative to the coordinate axis of the payload camera; t is the position of the origin of the lidar coordinate system in the payload camera coordinate system. Based on the calibration results, the lidar point cloud is projected onto the image of the payload detection device using the following formula: in, (u,v) represents the point cloud coordinates in the lidar coordinate system; (u,v) represents the pixel coordinates projected onto the imaging plane of the sensing camera. Let be the rotation matrix between the target and the initial pose of the sensing payload detection device; The step of selecting the lidar point cloud from the lidar point cloud whose projection is located at the target position includes: The projected lidar point cloud is filtered and downsampled. The downsampled lidar point cloud is clustered, and the class with the most point clouds is selected as the target point cloud.

2. The method according to claim 1, characterized in that, The load detection device includes a laser rangefinder, a sensing camera, a shifting mechanism, and a processor. By adjusting the attitude of the shifting mechanism and the focal length of the sensing camera, the target in the designated area is clearly photographed and the target type is identified by the processor. The laser rangefinder is used to accurately measure the target position in the designated area.

3. The method according to claim 2, characterized in that, The step of using the payload detection device to identify the target type and target location based on the preprocessed point cloud information and the robot's positioning information includes: Based on the preprocessed point cloud information, the load detection device automatically adjusts the posture of the rotation mechanism and the focal length of the sensing camera to obtain a clear image of the point cloud information. The laser rangefinder is used to measure the distance between the robot and the location of the point cloud to obtain the rotation matrix of the target and the initial posture of the sensing load detection device. Based on the clear image, the target type of the image is obtained by using the target recognition model in the processor to identify the target.

4. The method according to claim 3, characterized in that, The rotation matrix between the target and the initial pose of the sensing payload detection device is represented as: in, Let be the rotation matrix between the target and the initial pose of the sensing payload detection device; The orientation rotation matrix of the rotation mechanism; The distance between the robot and the location of the point cloud.

5. The method according to claim 3, characterized in that, The step of using a target recognition model to identify the target type of the image based on the clear image includes: The clear image is preprocessed to adapt to the input format of the target recognition model; The preprocessed image is input into a trained target recognition model to obtain the target detection result in the image; wherein, the target recognition model is a model using a YOLOv5 neural network.

6. The method according to claim 1, characterized in that, The point cloud preprocessing includes: Discrete radar points are removed from the lidar point cloud data, and filtering and downsampling are performed. After performing point cloud clustering on the filtered point cloud, IMU pre-integration estimation is used to complete the point cloud distortion removal, resulting in processed point cloud information.