A method and system for multi-source information fusion of air-ground unmanned systems under the demand of coastline inspection
By combining granular computing, multimodal feature extraction, and factor graph model, the problems of uncertainty in multi-source data and sensor failure in coastline inspection are solved, achieving efficient and reliable information fusion and target recognition, and improving the intelligence level of inspection tasks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- LIAONING UNIVERSITY OF TECHNOLOGY
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-19
AI Technical Summary
Existing multi-source information fusion technologies lack an effective hierarchical information processing mechanism in coastline inspection scenarios, leading to information redundancy or loss of key features, insufficient sensor fault detection sensitivity, and severe interference from dynamic targets, thus affecting the accuracy and stability of inspections.
High-confidence information particles are selected through granular computation processing. Multimodal feature extraction and weighted fusion are performed using convolutional neural networks. A factor graph model is constructed for sensor fault detection and compensation. Fusion weights are dynamically generated by combining rough set decision rules to perform motion distortion correction and dynamic target removal. A global inspection map is constructed and specific targets are identified.
It improves the ability to quantify and hierarchically process uncertainties in multi-source heterogeneous data, enhances the ability to detect sensor faults, improves the intelligence level and operational efficiency of coastline patrol, and ensures the accuracy and stability of patrol tasks.
Smart Images

Figure CN122244618A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer technology, and in particular relates to a method and system for multi-source information fusion of an air-ground unmanned system for coastline inspection. Background Technology
[0002] Coastline patrol, a crucial means of maintaining border security and combating smuggling and illegal immigration, is characterized by complex terrain, dynamic and ever-changing environments, and high requirements for mission continuity. However, traditional manual patrol methods suffer from inherent drawbacks such as limited coverage, low efficiency, and high safety risks, making them insufficient to meet the precision and efficiency demands of modern patrols. With the development of unmanned aerial systems (UAVs), unmanned platforms equipped with multi-source sensors, including visible light cameras, infrared cameras, lidar, inertial measurement units, and global navigation satellite systems, are gradually becoming core equipment for coastline patrol. These platforms can achieve environmental perception, positioning and navigation, and target recognition through multi-source information fusion, providing key technological support for improving patrol capabilities. However, the drastic changes in lighting, complex weather conditions, and dense dynamic targets (such as passing ships, birds, and pedestrians) in coastline patrol scenarios lead to limitations such as incomplete information, uncertainty, and susceptibility to interference in data from a single sensor. Relying solely on a single data source cannot comprehensively and reliably depict the patrol environment and target characteristics. Therefore, multi-source information fusion technology is necessary to integrate the advantages of different sensors for high-precision environmental perception and target detection.
[0003] The application of existing multi-source information fusion technologies in coastline inspection scenarios still faces many problems that urgently need to be solved: First, given the uncertain characteristics of the coastline environment, existing fusion methods lack effective hierarchical information processing mechanisms and fail to fully integrate tools such as granular computing and rough set theory to perform hierarchical processing and uncertainty quantification of multi-source data, resulting in information redundancy or loss of key features, affecting the scientificity and accuracy of fusion decisions; Second, while multi-sensor combinations can improve the system's perception redundancy, they also increase the probability of sensor failure. Existing fault-tolerant navigation technologies have insufficient sensitivity to sensor failure detection and a single compensation mechanism, making it difficult to quickly isolate faulty data and generate reliable alternative information, seriously affecting the stability of unmanned system navigation and positioning; Third, the frequent appearance of dynamic targets in coastline inspections can seriously interfere with the pose estimation and map building of visual SLAM systems. Existing visual SLAM-based fusion algorithms are insufficient in terms of dynamic target removal accuracy and robustness, and lack precise identification and tracking schemes for specific inspection targets such as smuggling vessels and shore-based smugglers, resulting in distorted map building and low target recognition accuracy, failing to meet the core requirements of inspection tasks. Summary of the Invention
[0004] Based on this, it is necessary to provide a method and system for multi-source information fusion of air-ground unmanned systems under the requirements of coastline inspection, in order to address the above-mentioned technical problems. The aim is to improve the uncertainty quantification and hierarchical processing capabilities of multi-source heterogeneous data, enhance the real-time detection capability of sensor faults in complex sea conditions, and improve the intelligence level, operational efficiency and reliability of coastline inspection.
[0005] Firstly, this application provides a method for fusing multi-source information from an air-to-ground unmanned system to meet the requirements of coastline inspection, including:
[0006] The system acquires raw data from multiple sensors, performs spatiotemporal synchronization on the raw data to obtain a synchronized data frame, performs granular computation processing on the synchronized data frame, divides the data from each sensor in the synchronized data frame into information granules, calculates the generalized information entropy of each information granule, compares the generalized information entropy with a preset dynamic threshold, and filters out high-confidence information granules based on the comparison results to generate filtered data with confidence labels.
[0007] Multimodal feature extraction is performed on the filtered data. Feature maps of visible light images, infrared images and lidar point clouds are extracted from the filtered data through convolutional neural networks. The feature maps are weighted and fused using a confidence label-guided attention mechanism to generate a fused feature map and corresponding feature point association information. Global average pooling is then performed on the fused feature map to extract global environmental features.
[0008] A factor graph model is constructed based on feature point association information. The filtered data is used as the factor edge constraints of the factor graph model. The residual Mahalanobis distance of each factor edge in the factor graph model is calculated. Fault detection is performed on the sensor data in the filtered data based on the residual Mahalanobis distance, and the health status label of each sensor is output. The faulty sensor data in the filtered data is compensated according to the health status label to obtain the updated filtered data. Based on the sensor health status label and combined with global environmental features, the fusion weight of each sensor is dynamically generated through rough set decision rules. The updated filtered data is weighted and fused according to the fusion weight of each sensor to obtain the optimal pose estimate.
[0009] The optimal pose estimation is used to correct motion distortion in the updated filtered LiDAR point cloud data, resulting in corrected point cloud frames. Dynamic target removal is performed by combining the fused feature map to construct a global inspection map. Candidate regions are extracted based on the global inspection map and the fused feature map. The candidate regions and the fused feature map are input into the target detection network for target recognition to obtain the target position. The target trajectory is generated by combining the target information of consecutive frames with the optimal pose estimation. The target behavior is analyzed based on the target trajectory and the global inspection map to generate an abnormal event report.
[0010] In one embodiment, the information granules include image granules, point cloud granules, temporal granules, and location granules, and the synchronization data frame includes image data, point cloud data, inertial measurement unit data, and global navigation satellite system data. Granularity calculation processing is performed on the synchronization data frame, dividing the sensor data in the synchronization data frame into information granules, calculating the generalized information entropy of each information granule, comparing the generalized information entropy with a preset dynamic threshold, and filtering high-confidence information granules based on the comparison results to generate filtered data with confidence labels, including:
[0011] Image data in the synchronous data frame is divided into image particles using a saliency-based adaptive non-uniform grid. The grid size is assigned according to the local gradient variance of the image data. Spatial voxel grid is used to divide the point cloud data in the synchronous data frame into point cloud particles.
[0012] A sliding time window is used to divide the inertial measurement unit data in the synchronous data frame, and the sequence of measurement values within a preset time period is taken as a time granule to obtain multiple time granules;
[0013] The location points in the synchronized data frame of the Global Navigation Satellite System are clustered and divided into location point granules, and the location points within a continuous preset time window are clustered into location point granules.
[0014] For each information particle, the corresponding generalized information entropy is calculated; where the generalized information entropy of the image particle is calculated based on the gray-level histogram entropy and pixel standard deviation, the generalized information entropy of the point cloud particle is calculated based on the eigenvalues and intensity distribution of the point cloud covariance matrix, the generalized information entropy of the temporal particle is calculated based on the standard deviation of the rate of change of acceleration and angular velocity and autocorrelation, and the generalized information entropy of the location particle is calculated based on the degree of location dispersion.
[0015] The preset dynamic thresholds for each type of information particle are determined by using the sliding window percentile method. The upper and lower quartiles of the generalized information entropy of the same type of information particle in a preset number of data frames are calculated. The upper and lower quartiles are then substituted into the preset threshold adjustment formula to obtain the preset dynamic thresholds for each type of information particle.
[0016] Information particles whose generalized information entropy exceeds the corresponding preset dynamic threshold are marked as unreliable particles and removed to obtain the removed information particles; the removed information particles are marked as high-confidence information particles, and a confidence label is configured for each high-confidence information particle to generate filtered data containing image confidence mask, point cloud confidence label, inertial measurement unit confidence label and global navigation satellite system confidence label.
[0017] In one embodiment, a factor graph model is constructed based on feature point association information. The filtered data is used as the factor edge constraints of the factor graph model. The residual Mahalanobis distance of each factor edge in the factor graph model is calculated. Fault detection is performed on the sensor data in the filtered data based on the residual Mahalanobis distance, and the health status label of each sensor is output. The faulty sensor data in the filtered data is compensated according to the health status label to obtain the updated filtered data, including:
[0018] Feature point landmark nodes are determined based on feature point association information, and a factor graph model is constructed using the pose nodes of the unmanned system and the feature point landmark nodes as vertices.
[0019] The pre-integration factor is calculated based on the inertial measurement unit data in the filtered data, the reprojection factor is calculated based on the visual feature points in the feature point association information, the LiDAR point cloud computing matching factor is calculated based on the filtered data, and the positioning factor is calculated based on the Global Navigation Satellite System data in the filtered data.
[0020] The pre-integration factor, reprojection factor, matching factor, and localization factor are used as factor edge constraints in the factor graph model; the initial information matrix of each factor edge is obtained by weighting the confidence labels of the corresponding sensors in the filtered data.
[0021] The factor graph model is nonlinearly optimized to obtain the current optimal state estimate. The residual vector of each factor edge is calculated. The covariance matrix of the residual vector is determined based on the noise model of each sensor. The residual Mahalanobis distance of each factor edge is calculated based on the residual vector and the covariance matrix.
[0022] The residual Mahalanobis distance of each factor edge is compared with a preset chi-square test threshold. Factor edges whose residual Mahalanobis distance exceeds the preset chi-square test threshold are marked as abnormal edges. The time-series cumulative statistics of each factor edge corresponding to the same sensor in a consecutive preset number of frames are performed. If the proportion of abnormal edges exceeds a preset proportion threshold, the sensor is determined to be faulty and the output is a health status label of fault isolation state. Otherwise, the output is a health status label of normal state.
[0023] Sensors with a health status label indicating fault isolation are identified as faulty sensors. Abnormal edges and their corresponding initial information matrices are removed from the faulty sensors. Based on the data of sensors with a health status label indicating normal status in the filtered data, a prediction model is constructed by combining the historical valid measurement values of the faulty sensors. Compensation data for the faulty sensors is generated through the prediction model. The compensation data replaces the measurement values of the corresponding faulty sensors in the filtered data, resulting in updated filtered data.
[0024] In one embodiment, based on sensor health status labels and combined with global environmental features, fusion weights for each sensor are dynamically generated using rough set decision rules. The updated, filtered data is then weighted and fused according to these fusion weights to obtain the optimal pose estimate, including:
[0025] Acquire fusion effect data under different sensor state combinations and different environment types, and construct a rough set decision table. The rows of the rough set decision table represent the combination of sensor state combinations and environment types, and the columns represent the weight configuration of each sensor.
[0026] Sensor health status labels are used as conditional attributes, and global environmental features are used as environmental type representations. They are jointly input into a rough set decision table and matched using an identifiable matrix attribute reduction algorithm to obtain the fusion weight coefficients of each sensor corresponding to the current state.
[0027] Based on the factor graph model, the initial information matrix of the corresponding factor edge is scaled by the fusion weight coefficient of each sensor to obtain the weighted factor graph model; the weighted factor graph model is then solved by nonlinear optimization to obtain the optimal pose estimate and the corresponding covariance matrix of the current state.
[0028] Calculate the innovative Mahalanobis distance between the optimal pose estimate of the current state and the actual observations of each sensor in the updated filtered data. If the innovative Mahalanobis distance does not exceed the preset consistency threshold, the optimal pose estimate of the current state is output as the final optimal pose estimate. If the innovative Mahalanobis distance exceeds the preset consistency threshold, a backoff mechanism is triggered to reduce the confidence of each sensor and re-optimize the weighted factor graph model nonlinearly.
[0029] In one embodiment, optimal pose estimation is used to correct motion distortion in the updated filtered LiDAR point cloud data to obtain a corrected point cloud frame. This frame is then combined with a fused feature map for dynamic target removal, constructing a global inspection map, including:
[0030] Using optimal pose estimation, motion distortion correction is performed on the LiDAR point cloud in the updated filtered data. Each point in the LiDAR point cloud is transformed to the preset coordinate system according to the corresponding timestamp through pose interpolation to obtain the corrected point cloud frame.
[0031] The fused feature map is input into the semantic segmentation network to obtain the semantic category label of each pixel. Motion consistency check is performed on the point clouds in the corrected point cloud frame that correspond to the semantic category label of dynamic target. The position change of the point cloud with the semantic category label of dynamic target in multiple consecutive frames is compared with the motion of the unmanned system itself. If the matching relationship is inconsistent, the point cloud with the semantic category label of dynamic target is judged as a dynamic object and removed from the corrected point cloud frame to obtain the static point cloud frame.
[0032] Based on the optimal pose estimation, the static point cloud frame is transformed to a preset coordinate system to obtain the transformed static point cloud frame. The transformed static point cloud frame is then fused and downsampled with the local map accumulated during the inspection process to obtain the updated local map. Using the fused feature map, loop closure detection is performed on the updated local map and the historical frame map. When a loop closure is detected, global pose map optimization is triggered to adjust the historical pose of the unmanned system. The updated local map and the historical frame map are then integrated to generate global map data. Semantic category labels are overlaid on the global map data to construct a global inspection map.
[0033] In one embodiment, the candidate regions include candidate regions for smuggling vessels and candidate regions for shore-based smugglers. Candidate regions are extracted based on a global inspection map and a fused feature map. These candidate regions and the fused feature map are then input into a target detection network for target recognition to obtain the target location. Combined with target information from consecutive frames related to optimal pose estimation, a target trajectory is generated. Based on the target trajectory and the global inspection map, target behavior is analyzed to generate an anomaly event report, including:
[0034] Based on the spatial geometric constraints and semantic category labels of the global inspection map, and combined with the fused feature map, candidate regions for smuggling vessels are extracted in the sea area, and candidate regions for shore-based smugglers are extracted in the shore area.
[0035] The candidate areas for smuggling vessels and onshore smugglers are input together with the fused feature map into the target detection network for target recognition to obtain the target location.
[0036] By combining optimal pose estimation, the target position is transformed to a preset coordinate system, and the target position in consecutive frames is correlated to generate the target trajectory.
[0037] The target trajectory is matched with the location relationship between the restricted area boundary and the normal inspection area in the global inspection map to analyze whether the target has abnormal behavior and generate an abnormal event report.
[0038] In one embodiment, the fusion weighting coefficients of each sensor are dynamically generated using the following formula:
[0039]
[0040] in, For the first Frame sensor The fusion weight coefficient satisfies ; For data frame sequence number; The set of conditional attributes in the rough set decision table; The set of decision attributes in the rough set decision table; It is an approximate accuracy function; For conditional attribute set For decision attribute set Approximate accuracy; For sensors The corresponding health status attribute is a subset of the conditional attributes. One of the elements; To eliminate sensors Corresponding health status attributes Approximate accuracy after; A collection of sensors; For sensor set The traversal variable; For traversal variables The corresponding sensor health status attributes; For the first Frame sensor Health status labels; For sensor-based Correction factor for health status; For the first Global environmental characteristic variables of the frame; The adaptation coefficient is based on global environmental characteristics.
[0041] Secondly, this application also provides a multi-source information fusion system for air-to-ground unmanned systems to meet the needs of coastline inspection, including:
[0042] The spatiotemporal synchronization module is used to acquire raw data from multiple sources of sensors, perform spatiotemporal synchronization on the raw data from multiple sources of sensors to obtain a synchronized data frame; perform granular calculation processing on the synchronized data frame, divide the data from each sensor in the synchronized data frame into information granules, calculate the generalized information entropy of each information granule, compare the generalized information entropy with a preset dynamic threshold, and filter out high-confidence information granules based on the comparison results to generate filtered data with confidence labels;
[0043] The deep feature fusion module is used to extract multimodal features from the filtered data. It extracts feature maps of visible light images, infrared images and lidar point clouds from the filtered data through a convolutional neural network. It uses a confidence label-guided attention mechanism to perform weighted fusion of the feature maps to generate a fused feature map and corresponding feature point association information. It then performs global average pooling on the fused feature map to extract global environmental features.
[0044] The fault-tolerant localization and weight optimization module is used to construct a factor graph model based on feature point association information. The filtered data is used as the factor edge constraints of the factor graph model. The residual Mahalanobis distance of each factor edge in the factor graph model is calculated. Based on the residual Mahalanobis distance, fault detection is performed on the sensor data in the filtered data, and a health status label for each sensor is output. The faulty sensor data in the filtered data is compensated according to the health status labels to obtain updated filtered data. Based on the sensor health status labels and combined with global environmental features, the fusion weights of each sensor are dynamically generated through rough set decision rules. The updated filtered data is then weighted and fused according to the fusion weights of each sensor to obtain the optimal pose estimate.
[0045] The environmental perception and anomaly analysis module is used to correct motion distortion of the LiDAR point cloud in the updated and filtered data using optimal pose estimation, obtaining corrected point cloud frames. It then combines these with a fused feature map to perform dynamic target removal, constructing a global inspection map. Based on the global inspection map and the fused feature map, candidate regions are extracted. The candidate regions and the fused feature map are input into the target detection network for target recognition to obtain the target location. The module combines the target information of consecutive frames with optimal pose estimation to generate the target trajectory. Based on the target trajectory and the global inspection map, the module analyzes the target behavior and generates anomaly event reports.
[0046] Thirdly, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the first aspect.
[0047] Fourthly, this application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps in the first aspect.
[0048] The aforementioned method and system for multi-source information fusion in an air-to-ground unmanned system for coastline inspection first performs spatiotemporal synchronization and granular computation processing on the raw data from multiple sensors. High-confidence information granules are selected using generalized information entropy, providing a reliable data foundation for subsequent fusion decisions. Second, a confidence-label-guided attention mechanism is used for multimodal feature weighted fusion, overcoming the shortcomings of traditional methods such as one-sided feature extraction and fixed fusion weights, thus enhancing feature representation capabilities in complex environments. Furthermore, sensor fault detection and intelligent compensation are performed based on a factor graph model, combined with rough set decision-making to dynamically generate fusion weights, improving the robustness and accuracy of pose estimation. Finally, optimal pose estimation is used for motion distortion correction and dynamic target removal, constructing a global inspection map and achieving target recognition and behavior analysis, further enhancing the intelligence level of the inspection task and the ability to respond to abnormal events. Attached Figure Description
[0049] To more clearly illustrate the technical solutions in the embodiments or related technologies of this application, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0050] Figure 1 A flowchart of a multi-source information fusion method for an air-to-ground unmanned system under the requirements of coastline inspection, provided as an exemplary embodiment of the present invention;
[0051] Figure 2 A flowchart of a method for obtaining optimal pose estimation is provided as an exemplary embodiment of the present invention;
[0052] Figure 3 This is a schematic diagram of a multi-source information fusion system for an air-to-ground unmanned system under the requirements of coastline inspection, provided as an exemplary embodiment of the present invention. Detailed Implementation
[0053] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0054] In one embodiment, such as Figure 1 As shown, a multi-source information fusion method for an air-to-ground unmanned system is provided for coastline inspection requirements. This embodiment illustrates the method by applying it to a terminal. It is understood that this method can also be applied to a server, and to a system including both a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the following steps:
[0055] S101: Acquire raw data from multiple sources of sensors, perform spatiotemporal synchronization on the raw data from multiple sources of sensors to obtain a synchronized data frame; perform granular calculation processing on the synchronized data frame, divide the data from each sensor in the synchronized data frame into information granules, calculate the generalized information entropy of each information granule, compare the generalized information entropy with a preset dynamic threshold, filter out high-confidence information granules based on the comparison results, and generate filtered data with confidence labels.
[0056] Specifically, because the data acquisition frequencies and spatiotemporal references of the multiple sensors carried by the air-to-ground unmanned system, such as visible light cameras, infrared cameras, lidar, inertial measurement units, and global navigation satellite systems, differ, direct fusion can lead to data mismatch and error accumulation. Therefore, the raw data from the multiple sensors can be spatiotemporally synchronized first. By aligning timestamps and unifying coordinate systems, the measurement data from different sensors can be mapped to the same spatiotemporal reference, forming a synchronized data frame and ensuring the spatiotemporal consistency of the data. Furthermore, considering factors such as sudden changes in illumination and meteorological interference in the coastal environment, the multi-source data may contain redundant information and noise interference. A single data filtering method is insufficient to effectively quantify data reliability. Therefore, a granular computing processing method can be adopted. Based on the characteristics of different sensor data, such as the spatial correlation of image data, the three-dimensional distribution characteristics of point cloud data, the temporal continuity of inertial measurement unit data, and the positional clustering of global navigation satellite system data, the data from each sensor in the synchronized data frame can be divided into multiple information granules, achieving hierarchical data decomposition.
[0057] Subsequently, the generalized information entropy of each information particle can be calculated to quantify the orderliness and reliability of the data within the particle. Lower generalized information entropy indicates stronger data consistency and higher reliability. By comparing the generalized information entropy with a preset dynamic threshold, unreliable information particles whose generalized information entropy exceeds the threshold can be eliminated, while high-confidence information particles are retained and assigned confidence labels. This process effectively filters redundant and interfering data, improves the reliability of the input data, and lays the foundation for subsequent feature extraction and fusion.
[0058] S102: Multimodal feature extraction is performed on the filtered data. Feature maps of visible light images, infrared images and lidar point clouds are extracted from the filtered data through a convolutional neural network. The feature maps are weighted and fused using a confidence label-guided attention mechanism to generate a fused feature map and corresponding feature point association information. Global average pooling is then performed on the fused feature map to extract global environmental features.
[0059] Specifically, different sensor modalities have complementary characteristics. For example, visible light images have rich texture details but are easily affected by lighting conditions; infrared images can still capture target outlines at night or in adverse weather conditions; and LiDAR point clouds can provide accurate 3D geometric information but have weak semantic expression. However, extracting features from only a single modality cannot fully characterize the inspection environment and target attributes. Therefore, convolutional neural networks can be used to extract modality-specific feature maps from the filtered data, namely visible light images, infrared images, and LiDAR point clouds. Considering the differences in confidence levels among different information particles, directly fusing feature maps with equal weights would introduce interference from unreliable features. Therefore, a confidence label generated by S101 can be used to guide the attention mechanism, weighting feature maps of different modalities and confidence levels. This makes the fusion process focus more on high-confidence features and suppress the influence of low-confidence features, thereby generating a fused feature map that combines the advantages of each modality. Furthermore, feature point association information can be obtained through feature matching, providing geometric constraints for subsequent pose estimation. In addition, global average pooling can be performed on the fused feature map to extract global environmental features that can characterize the overall characteristics of the current inspection scenario, so as to adapt to the need for dynamic adjustment of fusion weights based on environment type in the future.
[0060] S103: Construct a factor graph model based on feature point association information, use the filtered data as factor edge constraints of the factor graph model, calculate the residual Mahalanobis distance of each factor edge in the factor graph model, perform fault detection on each sensor data in the filtered data based on the residual Mahalanobis distance, output the health status label of each sensor, compensate the faulty sensor data in the filtered data according to the health status label, and obtain the updated filtered data; based on the sensor health status label and combined with global environmental features, dynamically generate the fusion weight of each sensor through rough set decision rules, and perform weighted fusion estimation on the updated filtered data according to the fusion weight of each sensor to obtain the optimal pose estimation.
[0061] Specifically, pose estimation is the core of navigation and mapping in unmanned systems. A factor graph model can be constructed based on the feature point association information output by the S102. The pose nodes and feature point landmark nodes of the unmanned system are used as vertices, and the pre-integration results of the inertial measurement unit, the reprojection relationship of visual feature points, the matching results of LiDAR point clouds, and the positioning data of the Global Navigation Satellite System in the filtered data are used as factor edge constraints, forming a geometric association model of multi-source data. While multi-sensor combinations can improve system redundancy, they also increase the probability of sensor failure. Faulty data can severely disrupt the constraint relationships of the factor graph model, leading to distortion in pose estimation. Therefore, the reliability of factor edge constraints can be quantified by calculating the residual Mahalanobis distance of each factor edge. Based on this distance, the validity of each sensor data in the filtered data can be detected, faulty sensors can be identified, and health status labels can be output. Simultaneously, compensation data can be generated using the measurement data of healthy sensors to replace the abnormal data of faulty sensors, resulting in updated filtered data, achieving fault isolation and data repair. Furthermore, considering the dynamic changes in sensor health status and inspection environment, fixed fusion weights cannot adapt to complex scenarios. Therefore, sensor health status labels and global environmental features extracted by S102 can be used as inputs. Rough set decision rules are used to mine the mapping relationship between sensor status, environment type, and fusion weights, dynamically generating fusion weights for each sensor. These weights are then used to scale the factor edge information matrix of the factor graph model, and the optimal pose estimation is obtained through nonlinear optimization. This optimization process achieves dynamic adaptation of the fusion strategy, avoiding the influence of faulty sensors and adjusting the fusion emphasis according to environmental changes, significantly improving the robustness and accuracy of pose estimation.
[0062] S104: Optimal pose estimation is used to correct motion distortion in the updated filtered LiDAR point cloud data to obtain corrected point cloud frames. Dynamic target removal is performed by combining the fused feature map to construct a global inspection map. Candidate regions are extracted based on the global inspection map and the fused feature map. The candidate regions and the fused feature map are input into the target detection network for target recognition to obtain the target position. Target information from consecutive frames is associated with optimal pose estimation to generate target trajectories. Target behavior is analyzed based on the target trajectory and the global inspection map to generate an abnormal event report.
[0063] Specifically, the LiDAR point cloud data is distorted during acquisition due to the movement of the unmanned system. Directly using this data for mapping leads to geometric distortion. Therefore, the optimal pose estimation obtained from S103 can be used to transform the point cloud to a unified coordinate system through pose interpolation based on the timestamps of each point, thus correcting motion distortion. Furthermore, the coastline inspection environment contains numerous dynamic targets such as passing ships, birds, and pedestrians, which interfere with static map construction. Therefore, the fused feature map output from S102 can be combined with semantic segmentation to obtain pixel-level semantic category labels. Motion consistency checks are then performed on point clouds semantically labeled as dynamic targets in the corrected point cloud frames, eliminating dynamic point clouds that do not match the unmanned system's own motion, resulting in static point cloud frames and ensuring the static geometric accuracy of the map. After transforming the static point cloud frames to a preset coordinate system, they can be fused and downsampled with the local maps accumulated during the inspection process. Loop closure detection is then performed using the fused feature map to eliminate accumulated errors. By overlaying semantic labels onto the sampled local map and the detected fused feature map, a global inspection map containing static geometric information and high-level semantic information can be constructed, providing spatial constraints and semantic support for object detection.
[0064] Furthermore, based on the spatial partitioning characteristics (sea surface and shore) and semantic labels of the global inspection map, candidate regions for smuggling vessels and shore-based smugglers can be extracted by combining fused feature maps. This narrows the target detection range and reduces background interference. The candidate regions and fused feature maps are then input into the target detection network to achieve accurate identification of specific targets and obtain their locations. Subsequently, optimal pose estimation can be used to transform the target location to a unified coordinate system, and target trajectory can be generated by associating target information from consecutive frames. By comparing the target trajectory with the positional relationship between restricted area boundaries and normal inspection areas in the global inspection map, it can be analyzed whether the target has violated regulations such as trespassing into restricted areas or abnormally loitering, generating an anomaly report. This report provides accurate evidence for coastline patrol and law enforcement, significantly improving the practical effectiveness of patrol missions.
[0065] The aforementioned method effectively addresses the issues of high uncertainty and redundant interference in multi-source data in coastal environments by performing granular computation processing and generalized information entropy filtering on raw data from multiple sensors to generate filtered data with confidence labels. Secondly, it extracts multimodal features and guides attention fusion using confidence scores to extract global environmental features, overcoming the shortcomings of single-modal feature representation and poor environmental adaptability. Furthermore, it implements sensor fault detection and compensation based on a factor graph model, and dynamically generates fusion weights using rough set decision rules, effectively solving the problems of weak robustness in traditional fixed-weight fusion and the impact of sensor faults on navigation accuracy. Finally, it constructs a global inspection map through dynamic target elimination, accurately extracts candidate regions, and completes specific target identification, tracking, and anomaly analysis, enhancing the accuracy of the inspection map and the targeting of target identification, thereby improving the perception accuracy and practical effectiveness of coastal patrols in complex environments.
[0066] In one embodiment, information granules may include image granules, point cloud granules, temporal granules, and location granules, and the synchronization data frame may include image data, point cloud data, inertial measurement unit data, and global navigation satellite system data, etc. Granularity calculation processing is performed on the synchronization data frame, dividing the sensor data in the synchronization data frame into information granules, calculating the generalized information entropy of each information granule, comparing the generalized information entropy with a preset dynamic threshold, and filtering high-confidence information granules based on the comparison result to generate filtered data with confidence labels, which may include:
[0067] Image data in the synchronous data frame is divided into image particles using a saliency-based adaptive non-uniform grid. The grid size is assigned according to the local gradient variance of the image data. Spatial voxel grid is used to divide the point cloud data in the synchronous data frame into point cloud particles.
[0068] A sliding time window is used to divide the inertial measurement unit data in the synchronous data frame, and the sequence of measurement values within a preset time period is taken as a time granule to obtain multiple time granules;
[0069] The location points in the synchronized data frame of the Global Navigation Satellite System are clustered and divided into location point granules, and the location points within a continuous preset time window are clustered into location point granules.
[0070] For each information particle, the corresponding generalized information entropy is calculated; where the generalized information entropy of the image particle is calculated based on the gray-level histogram entropy and pixel standard deviation, the generalized information entropy of the point cloud particle is calculated based on the eigenvalues and intensity distribution of the point cloud covariance matrix, the generalized information entropy of the temporal particle is calculated based on the standard deviation of the rate of change of acceleration and angular velocity and autocorrelation, and the generalized information entropy of the location particle is calculated based on the degree of location dispersion.
[0071] The preset dynamic thresholds for each type of information particle are determined by using the sliding window percentile method. The upper and lower quartiles of the generalized information entropy of the same type of information particle in a preset number of data frames are calculated. The upper and lower quartiles are then substituted into the preset threshold adjustment formula to obtain the preset dynamic thresholds for each type of information particle.
[0072] Information particles whose generalized information entropy exceeds the corresponding preset dynamic threshold are marked as unreliable particles and removed to obtain the removed information particles; the removed information particles are marked as high-confidence information particles, and a confidence label is configured for each high-confidence information particle to generate filtered data containing image confidence mask, point cloud confidence label, inertial measurement unit confidence label and global navigation satellite system confidence label.
[0073] Specifically, when partitioning image data, a saliency-based adaptive non-uniform mesh partitioning strategy can characterize the texture complexity and information density of local image regions by calculating the gradient variance. Regions with higher gradient variance correspond to scene areas with rich texture and high information content, and can be assigned smaller mesh sizes to retain detailed information. Conversely, regions with lower gradient variance correspond to smooth, information-redundant scene areas, and can be assigned larger mesh sizes to compress redundant data. Ultimately, the image data is divided into multiple image particles, each corresponding to a local region in the image with spatial correlation and information consistency. For point cloud data, a spatial voxel mesh partitioning method can be used. Based on the three-dimensional distribution range and resolution of the point cloud, the voxel resolution is set, dividing the three-dimensional space into regular voxel units. The set of point clouds within each voxel unit constitutes a point cloud particle, which can characterize the geometric structure and point distribution characteristics of the local three-dimensional space. For inertial measurement unit (IMU) data, a sliding time window approach can be used. A fixed-length time window is set and slid along the time axis of the IMU data. The acceleration and angular velocity measurement sequences within each time window are encapsulated into a time-series granule. This granule reflects the temporal continuity and trend of the IMU data over a short period. For Global Navigation Satellite System (GNSS) data, a location point clustering approach can be used. Based on the coordinates of location points within a continuous preset time window, density clustering algorithms are used to cluster the location points. The resulting set of clustered location points is treated as a location point granule. This granule characterizes the location clustering and stability of GNSS positioning results over a short period.
[0074] After information granulation is completed, the corresponding generalized information entropy can be calculated based on the characteristics of different types of information granules to quantify the orderliness and reliability of the data within each information granule. For example, for image granules, their generalized information entropy can be composed of gray-level histogram entropy and pixel standard deviation. The gray-level histogram entropy reflects the dispersion of pixel gray-level distribution within the image granule and can be obtained by calculating Shannon entropy from the histogram distribution of pixel gray-level values within the image granule. The pixel standard deviation reflects the fluctuation of pixel gray-level within the image granule. The combination of the two can comprehensively characterize the information consistency and noise level of the image granule. For point cloud granules, their generalized information entropy can be composed of the eigenvalues of the point cloud covariance matrix and the intensity value distribution. The eigenvalues of the point cloud covariance matrix reflect the three-dimensional distribution of the point cloud within the point cloud granule, while the intensity value distribution reflects the dispersion of the point cloud reflection intensity within the point cloud granule. The combination of the two can quantify the geometric stability and reflection information consistency of the point cloud granule. For time-series granules, their generalized information entropy can be composed of the standard deviation of the rate of change of acceleration and angular velocity, and autocorrelation. The standard deviation of the rate of change reflects the fluctuation amplitude of the inertial measurement unit data, while the autocorrelation reflects the temporal continuity of the data. The combination of the two can characterize the measurement stability and trend consistency of the inertial measurement unit data within the time-series granules. For location-point granules, their generalized information entropy can be calculated based on the dispersion of the location points. The degree of clustering of location points is quantified by the variance or standard deviation of the location point coordinates. The higher the dispersion, the more unstable the positioning results, and the higher the generalized information entropy.
[0075] Furthermore, when determining the preset dynamic threshold for each type of information particle, a fixed-length sliding window can be set first and slid along the data frame sequence. Within each window, the generalized information entropy set of the same type of information particle is statistically analyzed, and the upper and lower quartiles of this set are calculated. The upper quartile corresponds to the 75th quartile of the generalized information entropy within the window, and the lower quartile corresponds to the 25th quartile. The difference between the two is the interquartile range, which characterizes the dispersion of the generalized information entropy within the window. Substituting the upper and lower quartiles into the preset threshold adjustment formula, the preset dynamic threshold for that type of information particle within the current window can be obtained. Its calculation can be expressed as follows: , For the first The preset dynamic threshold of information granules, For the first in the sliding window The upper quartile of the generalized information entropy of information granular particles. For the first in the sliding window The lower quartile of the generalized information entropy of information granularity. This is the threshold adjustment coefficient, used to control the strictness of the threshold. Its value range can be set according to scenario requirements to balance noise filtering and retention of effective information. This threshold can adapt to the statistical characteristics of the data within the window, avoiding the insufficient adaptability of a fixed threshold when the scenario changes.
[0076] Specifically, the generalized information entropy of each information particle can be compared with a preset dynamic threshold for the corresponding type. If the generalized information entropy exceeds the preset dynamic threshold, the information particle is marked as unreliable, indicating that its internal data contains significant noise or mismatch, and is therefore removed. If the generalized information entropy does not exceed the preset dynamic threshold, it can be marked as a high-confidence information particle, indicating that its internal data has high consistency and reliability. For each high-confidence information particle, a confidence label can be configured according to the magnitude of its generalized information entropy. The lower the generalized information entropy, the higher the confidence, and the value range of the confidence label is normalized to the [0,1] interval. Finally, by integrating the high-confidence information particles of each modality and their corresponding confidence labels, filtered data containing image confidence masks, point cloud confidence labels, inertial measurement unit confidence labels, and global navigation satellite system confidence labels can be generated. This filtered data eliminates the interference of unreliable information and retains highly reliable multi-source data, providing an input basis for subsequent multi-modal feature extraction and fusion.
[0077] In one embodiment, a factor graph model is constructed based on feature point association information. The filtered data is used as factor edge constraints in the factor graph model. The residual Mahalanobis distance of each factor edge in the factor graph model is calculated. Fault detection is performed on the sensor data in the filtered data based on the residual Mahalanobis distance, and a health status label for each sensor is output. The faulty sensor data in the filtered data is compensated based on the health status labels to obtain updated filtered data, including:
[0078] Feature point landmark nodes are determined based on feature point association information, and a factor graph model is constructed using the pose nodes of the unmanned system and the feature point landmark nodes as vertices.
[0079] The pre-integration factor is calculated based on the inertial measurement unit data in the filtered data, the reprojection factor is calculated based on the visual feature points in the feature point association information, the LiDAR point cloud computing matching factor is calculated based on the filtered data, and the positioning factor is calculated based on the Global Navigation Satellite System data in the filtered data.
[0080] The pre-integration factor, reprojection factor, matching factor, and localization factor are used as factor edge constraints in the factor graph model; the initial information matrix of each factor edge is obtained by weighting the confidence labels of the corresponding sensors in the filtered data.
[0081] The factor graph model is nonlinearly optimized to obtain the current optimal state estimate. The residual vector of each factor edge is calculated. The covariance matrix of the residual vector is determined based on the noise model of each sensor. The residual Mahalanobis distance of each factor edge is calculated based on the residual vector and the covariance matrix.
[0082] The residual Mahalanobis distance of each factor edge is compared with a preset chi-square test threshold. Factor edges whose residual Mahalanobis distance exceeds the preset chi-square test threshold are marked as abnormal edges. The time-series cumulative statistics of each factor edge corresponding to the same sensor in a consecutive preset number of frames are performed. If the proportion of abnormal edges exceeds a preset proportion threshold, the sensor is determined to be faulty and the output is a health status label of fault isolation state. Otherwise, the output is a health status label of normal state.
[0083] Sensors with a health status label indicating fault isolation are identified as faulty sensors. Abnormal edges and their corresponding initial information matrices are removed from the faulty sensors. Based on the data of sensors with a health status label indicating normal status in the filtered data, a prediction model is constructed by combining the historical valid measurement values of the faulty sensors. Compensation data for the faulty sensors is generated through the prediction model. The compensation data replaces the measurement values of the corresponding faulty sensors in the filtered data, resulting in updated filtered data.
[0084] Specifically, feature point landmark nodes can be determined first based on feature point association information. For example, stable cross-frame matching feature points can be extracted from the fused feature map of visible light images, infrared images, and lidar point clouds. Feature points that are spatially unique and not easily affected by dynamic environmental changes in the coastline inspection scenario can be selected as landmark nodes. Each feature point landmark node represents a fixed geometric feature in the 3D inspection environment, and its coordinate information remains relatively stable in the global coordinate system. The pose node of the unmanned system corresponds to the pose state of the unmanned system at each sampling time, which can include 3D position coordinates and rotation attitude. The pose node and the feature point landmark node together constitute the vertex set of the factor graph model. Geometric constraints can be established between vertices through factor edges. The factor graph model is constructed in the form of a graph topology structure. This model intuitively describes the spatial association between the pose of the unmanned system and environmental feature points, providing a structured carrier for the fusion constraints of multi-source sensor measurement data.
[0085] Subsequently, various factor constraints can be calculated based on the multi-source sensor data from the filtered data. For example, for inertial measurement unit (IMU) data, a pre-integration method can be used to calculate the pre-integration factor. However, the raw acceleration and angular velocity data collected by the IMU exhibit integration drift characteristics. Directly integrating the pose values at adjacent time points can lead to long-term accumulated pose estimation errors. Therefore, the IMU measurements between adjacent pose nodes can be pre-integrated to obtain the relative pose transformation, velocity change, and offset between adjacent nodes, thus constructing a pre-integration factor. This factor can eliminate the influence of integration drift between pose nodes at different time points and improve the accuracy of temporal constraints. For visual feature point association information, feature point landmark nodes can be projected onto the visible light and infrared image planes of the current frame to obtain the coordinates of the projected points. The pixel coordinate deviation between the projected points and the actual image feature points is calculated to construct a reprojection factor. This reprojection factor quantifies the spatial consistency of feature point projections under the visual modality and is the core constraint of the visual sensor on the pose of the unmanned system. For LiDAR point cloud data, an iterative nearest-point algorithm can be used to register the current frame's LiDAR point cloud with the local map point cloud, obtaining the rigid body transformation relationship between the point clouds and constructing a matching factor. This matching factor, based on the LiDAR's 3D geometric information, can provide high-precision pose constraints, compensating for the insufficient feature representation of visual modalities in areas lacking coastline texture. For Global Navigation Satellite System (GNSS) data, the absolute position and velocity information of the unmanned system can be obtained based on GNSS measurement results. The deviation between this absolute information and the theoretical value of the unmanned system's pose estimation is calculated to construct a positioning factor. This positioning factor provides global absolute position constraints, effectively suppressing cumulative drift during the pose estimation process.
[0086] Furthermore, the aforementioned pre-integration factor, reprojection factor, matching factor, and localization factor can be used as factor edge constraints in the factor graph model. The initial information matrix of each factor edge is used to characterize the confidence and reliability of the corresponding factor edge constraint. Its value is negatively correlated with the covariance of the sensor measurement noise. The confidence label ranges from [0,1]. A higher confidence level indicates stronger reliability of the sensor data and a stronger constraint strength for the corresponding factor edge. For example, the confidence label for a vision sensor is... Then the initial information matrix of the visual reprojection factor can be expressed as: in This is the reference information matrix for the vision sensor, determined by the sensor's factory calibration parameters, representing the constraint strength of visual feature point projections under noise-free conditions. This weighting method enables adaptive adjustment of constraint strength based on data reliability, avoiding interference from low-confidence data in the factor graph optimization process and ensuring the accuracy of the optimization results.
[0087] Specifically, based on the constructed factor graph model, the Gauss-Newton method or the Levenberg-Marquardt method can be used as the solution algorithm for nonlinear optimization. This involves iteratively solving for the optimal state estimates of the pose nodes and feature point landmark nodes of the unmanned system by minimizing the sum of squared residuals of all factor edges. After optimization, the residual vector of each factor edge is calculated. The residual vector represents the deviation between the observed value of the factor edge and the theoretical value under the optimal state estimate. For example, the residual vector of the pre-integration factor represents the deviation between the relative pose obtained from pre-integration and the optimal pose estimate, while the residual vector of the reprojection factor represents the pixel coordinate deviation between the projected point and the actual image feature point. The covariance matrix of the residual vector can be determined based on the noise models of each sensor. The sensor noise model is jointly determined by the factory calibration parameters and actual operating conditions. For example, the noise model of the inertial measurement unit includes acceleration white noise and angular velocity random walk, while the noise model of the global navigation satellite system includes position measurement noise and velocity measurement noise. The covariance matrix of the residual vector is derived from the sensor noise model. Based on the residual vector and covariance matrix, the residual Mahalanobis distance of each factor edge can be calculated. The formula for calculating the residual Mahalanobis distance is:
[0088]
[0089] in, The residual Mahalanobis distance of the factor edges is used to quantify the anomaly of the residual vector; The residual vector is the factor edge, and its dimension is determined by the type of factor edge. The residual vector dimension of the pre-integrated factor is 6, the residual vector dimension of the reprojection factor is 2, and the residual vector dimension of the lidar matching factor and the global navigation satellite system positioning factor is 3. The covariance matrix of the residual vector is related to the sensor noise characteristics; This is the inverse of the covariance matrix. The residual Mahalanobis distance takes into account the correlation between the components of the residual vector. Compared with the Euclidean distance, it can more accurately reflect the degree of anomaly of the residual. The larger the residual Mahalanobis distance, the greater the deviation between the constraint of the factor edge and the optimal pose estimation, and the lower the reliability of the sensor data corresponding to that factor edge.
[0090] Specifically, the residual Mahalanobis distance of each factor edge can be compared with a preset chi-square test threshold. This chi-square test threshold is determined by the chi-square distribution, and its degrees of freedom are consistent with the dimension of the residual vector. The significance level can be set according to the actual application scenario of coastline inspection, and it is used to determine the abnormal boundary of the residual Mahalanobis distance. If the residual Mahalanobis distance of a factor edge is greater than the preset chi-square test threshold, the factor edge can be marked as an abnormal edge, indicating that the corresponding sensor data is abnormal. Subsequently, time-series cumulative statistics can be performed on all factor edges corresponding to the same sensor in a consecutive preset number of frames to calculate the proportion of abnormal edges. The proportion of abnormal edges is the ratio of the number of abnormal edges of the sensor in the consecutive frames to the total number of corresponding factor edges. If the proportion of abnormal edges exceeds a preset proportion threshold, the sensor can be determined to be faulty, and a health status label of fault isolation status can be output, indicating that the measurement data of the sensor is unreliable and fault isolation processing is required. If the proportion of abnormal edges does not exceed the preset proportion threshold, the sensor can be determined to be in normal working condition, and a health status label of normal state can be output. This time-series cumulative statistics process can avoid misjudgment caused by single-frame data anomalies and improve the robustness and accuracy of sensor fault detection.
[0091] Furthermore, after sensor fault detection is completed, the data from the faulty sensors can be compensated and the filtered data updated. For example, all abnormal edges corresponding to sensors with a health status label of fault isolation can be removed first, along with the initial information matrix corresponding to these abnormal edges, to prevent faulty data from continuously interfering with the optimization process of the factor graph model. Based on the valid measurement data of sensors with a health status label of normal in the filtered data, a prediction model can be constructed by combining the historical valid measurement values of the faulty sensors. The prediction model can be a time series prediction model or a Kalman prediction model based on multi-sensor fusion. It can utilize the temporal and spatial correlation characteristics of normal sensor data to perform temporal and feature correlation prediction on the measurement values of the faulty sensors, generating compensated data for the faulty sensors. Replacing the original measurement values of the corresponding faulty sensors in the filtered data with the generated compensated data yields updated filtered data. This updated filtered data eliminates the interference of faulty data, retains reliable measurement data from normal sensors, and fills in the data gaps of faulty sensors through compensated data, ensuring the continuity and accuracy of subsequent multimodal feature fusion, pose estimation, and global inspection map construction processes.
[0092] In one embodiment, such as Figure 2 As shown, based on sensor health status labels and combined with global environmental features, fusion weights for each sensor are dynamically generated using rough set decision rules. The updated, filtered data is then weighted and fused according to these sensor weights to obtain the optimal pose estimate, which may include:
[0093] S201: Obtain fusion effect data under different sensor state combinations and different environment types, and construct a rough set decision table. The rows of the rough set decision table represent the combination of sensor state combinations and environment types, and the columns represent the weight configuration of each sensor.
[0094] S202: The sensor health status label is used as a conditional attribute, and the global environmental features are used as an environmental type representation. They are input into the rough set decision table and matched by the identifiable matrix attribute reduction algorithm to obtain the fusion weight coefficients of each sensor corresponding to the current state.
[0095] S203: Based on the factor graph model, the initial information matrix of the corresponding factor edge is scaled by the fusion weight coefficient of each sensor to obtain the weighted factor graph model; the weighted factor graph model is solved by nonlinear optimization to obtain the optimal pose estimate of the current state and the corresponding covariance matrix.
[0096] S204: Calculate the innovative Mahalanobis distance between the optimal pose estimate of the current state and the actual observations of each sensor in the updated filtered data. If the innovative Mahalanobis distance does not exceed the preset consistency threshold, the optimal pose estimate of the current state is output as the final optimal pose estimate. If the innovative Mahalanobis distance exceeds the preset consistency threshold, a backoff mechanism is triggered to reduce the confidence of each sensor and re-optimize the weighted factor graph model nonlinearly.
[0097] Specifically, sensor state combinations can include combinations of sensors in normal working state, fault isolation state, and partially failed state. Environmental types can cover typical scenarios in coastline inspections, such as open sea areas, coastal vegetation-covered areas, areas with complex weather interference, and nearshore shallow water areas. The fusion effect data can be evaluated using the mean square error of pose estimation, feature matching success rate, global map construction accuracy, and target detection accuracy as core indicators. This data is obtained through offline experiments and online inspection data collection, including fusion effect sample data under different combinations. Subsequently, a rough set decision table can be constructed based on the fusion effect data. The rows of this decision table represent the combinations of sensor state combinations and environmental types, with each row corresponding to a unique mapping relationship between sensor state combinations and environmental types. The columns represent the weight configuration of each sensor, with column elements representing the optimal weight value for each sensor under the corresponding combination. The columns can also include fusion effect evaluation indicators as supporting decision attributes. Through this decision table, an explicit mapping relationship between sensor state, environmental type, and fusion weights can be established, providing data support for subsequent weight generation.
[0098] Furthermore, sensor health status labels and global environmental features can be used as core input parameters into a rough set decision table. Through an identifiable matrix attribute reduction algorithm, core attributes are selected and weights are dynamically allocated based on the approximate accuracy of the rough set. For example, the fusion weight coefficients of each sensor can be dynamically generated using the following formula:
[0099]
[0100] in, For the first Frame sensor The fusion weight coefficient satisfies ; This is the data frame sequence number, corresponding to the continuous data frames collected by the unmanned system during the coastline inspection process; It is the set of conditional attributes in the rough set decision table, which includes the health status attributes of each sensor and the global environmental characteristic attributes; This is the set of decision attributes in the rough set decision table, corresponding to the evaluation index of fusion effect, used to characterize the optimization objective of fusion weight; It is an approximate accuracy function; For conditional attribute set For decision attribute set The approximate accuracy represents the overall explanatory power of all current conditional attributes for the fusion effect; For sensors The corresponding health status attribute is a set of conditional attributes. One of the elements, whose values are either normal state or fault isolation state, is used to characterize the sensor. operational reliability; To eliminate sensors Corresponding health status attributes The conditional attribute subset is compared with the decision attribute set. The approximate accuracy, through this approximate accuracy and The difference can quantify the sensor The greater the difference between the health status attributes of the sensor and the fusion weights, the higher the influence of the sensor's health status on the fusion weights. The set of sensors involved in the fusion includes all sensors involved in pose fusion, such as visible light cameras, infrared cameras, lidar, inertial measurement units, and global navigation satellite systems. For sensor set The traversal variable; For traversal variables The corresponding sensor health status attributes are conditional attribute sets. Zhongyu The corresponding element; For the first Frame sensor The health status label, whose value ranges from a normal status indicator to a fault isolation status indicator, is a quantitative representation of the sensor's health status.
[0101] Indicatively, in the above formula For sensor-based The health status correction coefficient is dynamically adjusted based on the sensor's health status label. When the sensor is in a normal state, the correction coefficient is higher, and when the sensor is in a fault isolation state, the correction coefficient is zero or extremely low. This correction coefficient can directly shield the weight contribution of faulty sensors and avoid interference from faulty data in the fusion process. For the first The global environmental feature variables of the frame are represented by the global environmental features output from the multimodal feature extraction stage, including feature information such as the texture complexity of the coastline environment, three-dimensional geometric structure features, and dynamic target density. This is an adaptation coefficient based on global environmental characteristics. This coefficient is dynamically adjusted according to the type of global environmental characteristics. For example, in open sea areas, this coefficient is tilted towards sensors corresponding to lidar and global navigation satellite systems, while in areas with vegetation covering the coastline, this coefficient is tilted towards sensors corresponding to infrared cameras and inertial measurement units. Through this adaptation coefficient, the fusion weights can be adaptively adjusted to different coastline environments.
[0102] Based on the identifiable matrix attribute reduction algorithm, an identifiable matrix can first be constructed using a rough set decision table. The elements of the identifiable matrix represent the degree of difference between any two combined samples on a conditional attribute. If two samples have different values on a certain conditional attribute, the corresponding position of that element is 1; otherwise, it is 0. Subsequently, the conditional attribute set is solved using the identifiable matrix. Minimal reduction, eliminating dependencies on decision attributes Redundant attributes that do not contribute are excluded, while core sensor health status attributes and global environmental feature attributes are retained. Finally, based on the reduced core attributes, the sensor health status label and global environmental features of the current inspection frame are matched, and the fusion weight coefficients of each sensor corresponding to the current state are solved through rough set decision rules. This process achieves accurate matching between fusion weights and sensor states and environmental features, avoiding the problem of insufficient adaptability of fixed weights in complex coastline scenarios.
[0103] Specifically, after obtaining the fusion weight coefficients of each sensor, the initial information matrix of the corresponding factor edge can be scaled using these coefficients. The initial information matrix represents the original confidence level of the factor edge constraint. By multiplying the fusion weight coefficients by the initial information matrix, the strength of the factor edge constraint can be dynamically adjusted. Sensors with higher weight coefficients have stronger constraint strengths for their corresponding factor edges, and vice versa. This yields a weighted factor graph model. Subsequently, the weighted factor graph model can be solved nonlinearly using the Gauss-Newton method or the Levenberg-Marquardt method. The objective of the nonlinear optimization is to minimize the sum of squared residuals of all factor edges in the weighted factor graph model. By iteratively updating the state values of the pose nodes and feature point landmark nodes of the unmanned system, the optimal pose estimate and the corresponding covariance matrix in the current state can be obtained. The covariance matrix is used to characterize the uncertainty of the optimal pose estimate, providing a numerical basis for subsequent consistency verification.
[0104] After completing the initial solution for the optimal pose estimation, robustness verification can be performed by calculating the innovative Mahalanobis distance to ensure the accuracy of the pose estimation. The innovative Mahalanobis distance is used to quantify the degree of consistency between the optimal pose estimation and the actual observations from each sensor, and its calculation formula is as follows:
[0105]
[0106] in, The new Mahalanobis distance; The innovation vector is the deviation vector between the optimal pose estimate and the actual observations of each sensor, and its dimension is consistent with the dimension of the sensor observation space. The covariance matrix corresponding to the optimal pose estimation is obtained by a nonlinear optimization process. It is the inverse of the covariance matrix.
[0107] Furthermore, the calculated innovative Mahalanobis distance can be compared with a preset consistency threshold, which is set based on the pose estimation accuracy requirements of the coastline inspection scenario. If the innovative Mahalanobis distance does not exceed the preset consistency threshold, it indicates that the current optimal pose estimation is highly consistent with the actual observations of each sensor. This optimal pose estimation can be used as the final optimal pose estimation output for subsequent point cloud distortion correction and global inspection map construction. If the innovative Mahalanobis distance exceeds the preset consistency threshold, a rollback mechanism can be triggered, indicating that there is a deviation between the current fusion weight configuration and sensor observations. It is necessary to reduce the confidence of each sensor and re-optimize the weighted factor graph model nonlinearly. During the rollback process, the optimization strategy can be adjusted by reducing the overall weight allocation of the fusion weight coefficients and increasing the initial information matrix weight of the factor edges, until the innovative Mahalanobis distance meets the consistency threshold requirement, ensuring that the final output optimal pose estimation has sufficient accuracy and robustness to adapt to the complex dynamic environment of coastline inspection.
[0108] In one embodiment, optimal pose estimation is used to correct motion distortion in the updated filtered LiDAR point cloud data to obtain a corrected point cloud frame. This frame is then combined with a fused feature map for dynamic target removal, and a global inspection map is constructed, including:
[0109] Using optimal pose estimation, motion distortion correction is performed on the LiDAR point cloud in the updated filtered data. Each point in the LiDAR point cloud is transformed to the preset coordinate system according to the corresponding timestamp through pose interpolation to obtain the corrected point cloud frame.
[0110] The fused feature map is input into the semantic segmentation network to obtain the semantic category label of each pixel. Motion consistency check is performed on the point clouds in the corrected point cloud frame that correspond to the semantic category label of dynamic target. The position change of the point cloud with the semantic category label of dynamic target in multiple consecutive frames is compared with the motion of the unmanned system itself. If the matching relationship is inconsistent, the point cloud with the semantic category label of dynamic target is judged as a dynamic object and removed from the corrected point cloud frame to obtain the static point cloud frame.
[0111] Based on the optimal pose estimation, the static point cloud frame is transformed to a preset coordinate system to obtain the transformed static point cloud frame. The transformed static point cloud frame is then fused and downsampled with the local map accumulated during the inspection process to obtain the updated local map. Using the fused feature map, loop closure detection is performed on the updated local map and the historical frame map. When a loop closure is detected, global pose map optimization is triggered to adjust the historical pose of the unmanned system. The updated local map and the historical frame map are then integrated to generate global map data. Semantic category labels are overlaid on the global map data to construct a global inspection map.
[0112] Specifically, LiDAR point cloud acquisition is completed point-by-point scanning. Within a single frame's point cloud acquisition cycle, the unmanned system is in continuous motion, causing positional shifts in the original sensor coordinate system for point clouds acquired at different timestamps, resulting in motion distortion. Directly using this for mapping may lead to geometric distortion. Therefore, based on optimal pose estimation, each LiDAR point can be pose-interpolated using spherical linear interpolation according to its acquisition timestamp, transforming it to a preset coordinate system. This preset coordinate system is a globally unified coordinate system used to integrate multi-frame point cloud data. The core formula for pose interpolation can be expressed as:
[0113]
[0114] in, For lidar points In timestamp The corresponding pose transformation matrix; This represents the optimal pose matrix at the start of a single-frame point cloud acquisition. SLERP(·) is the optimal pose matrix at the end of a single frame point cloud acquisition; SLERP(·) is a spherical linear interpolation function used to achieve smooth pose transition in a special Euclidean group SE(3) to avoid rotational distortion caused by linear interpolation. It is the identity matrix; For lidar points The time difference relative to the start time of a single frame acquisition; This represents the total acquisition time for a single frame of point cloud data. The original coordinates of each LiDAR point are then processed... Transform to the preset coordinate system to obtain the corrected point cloud frame after eliminating motion distortion.
[0115] Furthermore, the fused feature map output from the multimodal feature extraction stage can be input into a semantic segmentation network. This semantic segmentation network employs an encoder-decoder structure and can perform pixel-by-pixel classification on the fused feature map using a trained model, outputting a semantic category label for each pixel. Semantic categories can include static backgrounds, dynamic vessels, pedestrians, birds, etc. Projecting the corrected point cloud frames onto the image plane of the fused feature map using camera intrinsic and extrinsic parameters establishes a one-to-one correspondence between point clouds and pixels, allowing for the selection of a subset of point clouds whose corresponding semantic category label is a dynamic target. The positional changes of this subset across multiple consecutive frames can be compared to the motion of the unmanned system itself, performing a motion consistency check on the subset. Specifically, the positional changes of static point clouds should be entirely driven by the motion of the unmanned system, with their displacement vectors consistent with the theoretical displacement vector derived from the unmanned system's pose transformation. However, the displacement vectors of dynamic point clouds contain their own motion components, resulting in deviations from the theoretical displacement vectors. If the displacement deviation between the two exceeds a preset threshold, the subset of point cloud can be determined to be a dynamic object, and it can be removed from the corrected point cloud frame to obtain a static point cloud frame containing only static environmental information, thus avoiding dynamic targets from interfering with the construction of the static geometric structure of the global map.
[0116] Subsequently, based on the optimal pose estimation, the static point cloud frame is transformed to a preset coordinate system, resulting in a transformed static point cloud frame. A voxel downsampling method is then used to fuse this frame with the local map accumulated during the inspection process. This allows for deduplication and compression of point clouds in overlapping areas, preserving key geometric features to obtain an updated local map. Loop closure detection is performed using the fused feature map. Key features are extracted from the current local map and historical frame maps and matched. If similar regions with more than a preset threshold of feature matches exist, a loop closure is detected, indicating that the unmanned system has returned to a previously traversed scene area. This also triggers global pose map optimization, connecting historical pose nodes with the current pose node through loop closure constraints to adjust the historical pose of the unmanned system and eliminate accumulated drift in pose estimation. Finally, the updated local map and historical frame maps are integrated to generate global map data. Semantic category labels are then overlaid on this global map data, constructing a global inspection map containing 3D geometric and semantic information, providing accurate environmental constraints for subsequent target detection and abnormal behavior analysis.
[0117] In one embodiment, the candidate areas include smuggling vessel candidate areas and shore-based smuggling passenger candidate areas;
[0118] Candidate regions are extracted based on the global inspection map and fused feature map. These candidate regions and the fused feature map are then input into a target detection network for target recognition to obtain the target location. Combined with the target information from consecutive frames and optimal pose estimation, a target trajectory is generated. Based on the target trajectory and the global inspection map, target behavior is analyzed, and anomaly event reports are generated, including:
[0119] Based on the spatial geometric constraints and semantic category labels of the global inspection map, and combined with the fused feature map, candidate regions for smuggling vessels are extracted in the sea area, and candidate regions for shore-based smugglers are extracted in the shore area.
[0120] The candidate areas for smuggling vessels and onshore smugglers are input together with the fused feature map into the target detection network for target recognition to obtain the target location.
[0121] By combining optimal pose estimation, the target position is transformed to a preset coordinate system, and the target position in consecutive frames is correlated to generate the target trajectory.
[0122] The target trajectory is matched with the location relationship between the restricted area boundary and the normal inspection area in the global inspection map to analyze whether the target has abnormal behavior and generate an abnormal event report.
[0123] Specifically, the global inspection map stores annotations of 3D geometric structures and semantic categories. First, based on these semantic category labels, the global inspection map can be divided into a sea area and a shore area. The sea area is the extraction range for smuggling vessel candidate regions, and the shore area is the extraction range for shore-based smuggler candidate regions. Within the sea area, multimodal information from the fused feature map can be combined to detect local areas with ship outline features, infrared thermal imaging features, and semantic offsets from normal waterways, generating smuggling vessel candidate regions. Within the shore area, personnel clustering features and coastline proximity features from the fused feature map can be combined to detect local areas close to the shoreline that differ from normal passage areas, generating shore-based smuggler candidate regions. Furthermore, during candidate region extraction, spatial and semantic constraints can be used to narrow the target detection range to avoid redundant computation and background interference caused by full-map detection, improving subsequent recognition efficiency and accuracy.
[0124] Subsequently, candidate regions for smuggling vessels and shore-based smugglers, along with the fused feature map, can be input into the target detection network. This network employs a multimodal feature-guided single-stage detection structure, using the fused feature map as the base feature input. Candidate regions serve as attention masks, enhancing the feature responses of corresponding regions and suppressing interference from irrelevant background features. The network extracts fine-grained features within the candidate regions through convolutional and pooling layers. A classification head outputs the target category confidence, and a regression head outputs the target bounding box coordinates, yielding the target location. This location can be output in either image planar coordinates or sensor local coordinates, covering both smuggling vessels and shore-based smugglers.
[0125] Specifically, the target position can be transformed to a preset coordinate system based on the optimal pose transformation matrix using camera intrinsic parameters and sensor extrinsic parameters, combined with optimal pose estimation. After the transformation, the Hungarian algorithm can be used to correlate the target positions in consecutive frames, match the cross-frame position information of the same target, and generate the target trajectory. The trajectory consists of the target's position sequence in the preset coordinate system, the corresponding timestamp, and the category label, thus completely recording the target's movement process within the inspection cycle.
[0126] Finally, the target trajectory can be matched with the restricted area boundaries and normal inspection areas on the global inspection map to analyze target behavior. For example, for smuggling vessels, if their trajectory enters the restricted area boundary, deviates from the normal channel for an extended period, or stays in a prohibited area, it can be identified as abnormal behavior. Similarly, for shore-based smugglers, if their trajectory approaches the restricted area boundary, gathers in non-traffic areas, or lingers for an extended period, it can be identified as abnormal behavior. For targets identified as abnormal, an abnormal event report can be generated. The report can include information such as target category, abnormal behavior type, start and end times of the target trajectory, global coordinates, and abnormal area markings on the global inspection map, thereby providing data support and decision-making basis for coastline patrol and law enforcement.
[0127] Based on the same inventive concept, this application also provides a multi-source information fusion system for an air-to-ground unmanned system under the coastline inspection requirement, for implementing the aforementioned method for multi-source information fusion of an air-to-ground unmanned system under the coastline inspection requirement. The solution provided by this system is similar to the solution described in the above method. Therefore, the specific limitations of one or more embodiments of the multi-source information fusion system for an air-to-ground unmanned system under the coastline inspection requirement provided below can be found in the limitations of the multi-source information fusion method for an air-to-ground unmanned system under the coastline inspection requirement described above, and will not be repeated here.
[0128] In one exemplary embodiment, such as Figure 3 As shown, a multi-source information fusion system 300 for an air-to-ground unmanned system is provided to meet the needs of coastline inspection, including:
[0129] The spatiotemporal synchronization module 301 is used to acquire raw data from multiple sources of sensors, perform spatiotemporal synchronization on the raw data from multiple sources of sensors to obtain a synchronized data frame; perform granular calculation processing on the synchronized data frame, divide the data from each sensor in the synchronized data frame into information granules, calculate the generalized information entropy of each information granule, compare the generalized information entropy with a preset dynamic threshold, and filter out high-confidence information granules based on the comparison results to generate filtered data with confidence labels;
[0130] The deep feature fusion module 302 is used to extract multimodal features from the filtered data. It extracts feature maps of visible light images, infrared images and lidar point clouds from the filtered data through a convolutional neural network. It uses a confidence label-guided attention mechanism to perform weighted fusion of the feature maps to generate a fused feature map and corresponding feature point association information. It then performs global average pooling on the fused feature map to extract global environmental features.
[0131] The fault-tolerant localization and weight optimization module 303 is used to construct a factor graph model based on feature point association information. The filtered data is used as the factor edge constraints of the factor graph model. The residual Mahalanobis distance of each factor edge in the factor graph model is calculated. Based on the residual Mahalanobis distance, fault detection is performed on the sensor data in the filtered data, and the health status label of each sensor is output. The faulty sensor data in the filtered data is compensated according to the health status label to obtain the updated filtered data. Based on the sensor health status label and combined with global environmental features, the fusion weight of each sensor is dynamically generated through rough set decision rules. The updated filtered data is weighted and fused according to the fusion weight of each sensor to obtain the optimal pose estimate.
[0132] The environmental perception and anomaly analysis module 304 is used to perform motion distortion correction on the LiDAR point cloud in the updated filtered data using optimal pose estimation, obtain the corrected point cloud frame, perform dynamic target removal by combining the fused feature map, construct a global inspection map, extract candidate regions based on the global inspection map and the fused feature map, input the candidate regions and the fused feature map into the target detection network for target recognition, obtain the target position, combine the target information of the consecutive frames with optimal pose estimation to generate the target trajectory, analyze the target behavior based on the target trajectory and the global inspection map, and generate an anomaly event report.
[0133] In one exemplary embodiment, the present invention also provides a computer device, including a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the steps of the multi-source information fusion method for an unmanned air-to-ground system under the requirements of coastline patrol according to this application. A multi-core processor is preferred to improve the system's parallel processing capability. The memory provides sufficient temporary storage space to support program execution and data processing. The memory capacity should be large enough to accommodate a large amount of data and computational tasks.
[0134] In one exemplary embodiment, the present invention also provides a computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program implements the steps of a multi-source information fusion method for an unmanned air-to-ground system under the requirements of coastline patrol according to this application. The computer-readable storage medium may include: a read-only memory, a random access memory, a solid-state drive, or an optical disk, etc.
[0135] The above-described embodiments are merely illustrative of several implementation methods of the embodiments of this application, and their descriptions are relatively specific and detailed. However, they should not be construed as limiting the scope of the patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the embodiments of this application, and these modifications and improvements all fall within the protection scope of the embodiments of this application.
Claims
1. A method for multi-source information fusion of unmanned aerial system under the demand of coastline inspection, characterized in that, The method includes: Acquire raw data from multiple sources of sensors, perform spatiotemporal synchronization on the raw data to obtain a synchronized data frame; perform granular computation processing on the synchronized data frame, divide the sensor data in the synchronized data frame into information granules, calculate the generalized information entropy of each information granule, compare the generalized information entropy with a preset dynamic threshold, filter out high-confidence information granules based on the comparison result, and generate filtered data with confidence labels; Multimodal feature extraction is performed on the filtered data. Feature maps of visible light images, infrared images, and lidar point clouds in the filtered data are extracted by a convolutional neural network. The feature maps are weighted and fused using the confidence label-guided attention mechanism to generate a fused feature map and corresponding feature point association information. Global average pooling is performed on the fused feature map to extract global environmental features. A factor graph model is constructed based on the feature point association information. The filtered data is used as the factor edge constraints of the factor graph model. The residual Mahalanobis distance of each factor edge in the factor graph model is calculated. Fault detection is performed on the sensor data in the filtered data based on the residual Mahalanobis distance, and the health status label of each sensor is output. The faulty sensor data in the filtered data is compensated according to the health status label to obtain the updated filtered data. Based on the sensor health status label and combined with the global environmental features, the fusion weight of each sensor is dynamically generated through rough set decision rules. The updated filtered data is weighted and fused according to the fusion weight of each sensor to obtain the optimal pose estimate. The optimal pose estimation is used to correct motion distortion in the updated filtered LiDAR point cloud data to obtain corrected point cloud frames. Dynamic target removal is performed by combining the fused feature map to construct a global inspection map. Candidate regions are extracted based on the global inspection map and the fused feature map. The candidate regions and the fused feature map are input into a target detection network for target recognition to obtain the target position. The target trajectory is generated by combining the target information of consecutive frames with the optimal pose estimation. The target behavior is analyzed based on the target trajectory and the global inspection map to generate an abnormal event report.
2. The method according to claim 1, characterized in that, The information particles include image particles, point cloud particles, temporal particles, and location point particles; The synchronized data frames include image data, point cloud data, inertial measurement unit data, and global navigation satellite system data; The step involves performing granular computation processing on the synchronized data frame, dividing the sensor data in the synchronized data frame into information granules, calculating the generalized information entropy of each information granule, comparing the generalized information entropy with a preset dynamic threshold, filtering out high-confidence information granules based on the comparison result, and generating filtered data with confidence labels, including: The image data in the synchronous data frame is divided using a saliency-based adaptive non-uniform grid, and the grid size is assigned according to the local gradient variance of the image data to obtain the image particles; the point cloud data in the synchronous data frame is divided using a spatial voxel grid to obtain the point cloud particles. A sliding time window is used to divide the inertial measurement unit data in the synchronization data frame, and the sequence of measurement values within a preset time period is taken as a time granule to obtain multiple time granules; The location points in the global navigation satellite system data in the synchronization data frame are clustered and divided into location point particles, and the location points within a continuous preset time window are clustered into the location point particles. For each of the information particles, the corresponding generalized information entropy is calculated; wherein, the generalized information entropy of the image particle is calculated based on the gray-level histogram entropy and pixel standard deviation, the generalized information entropy of the point cloud particle is calculated based on the eigenvalues and intensity distribution of the point cloud covariance matrix, the generalized information entropy of the temporal particle is calculated based on the standard deviation of the rate of change of acceleration and angular velocity and autocorrelation, and the generalized information entropy of the location particle is calculated based on the degree of location dispersion; The preset dynamic threshold of each type of information particle is determined by using the sliding window percentile method. The upper quartile and lower quartile of the generalized information entropy of the same type of information particle in a preset number of data frames are calculated. The upper quartile and lower quartile are substituted into the preset threshold adjustment formula to obtain the preset dynamic threshold of each type of information particle. Information particles whose generalized information entropy exceeds the corresponding preset dynamic threshold are marked as unreliable particles and removed to obtain the removed information particles; the removed information particles are marked as high-confidence information particles, and a confidence label is assigned to each high-confidence information particle to generate the filtered data containing image confidence mask, point cloud confidence label, inertial measurement unit confidence label and global navigation satellite system confidence label.
3. The method of claim 1, wherein, The process involves constructing a factor graph model based on the feature point association information, using the filtered data as factor edge constraints in the factor graph model, calculating the residual Mahalanobis distance of each factor edge in the factor graph model, performing fault detection on each sensor data in the filtered data based on the residual Mahalanobis distance, outputting a health status label for each sensor, and compensating for faulty sensor data in the filtered data based on the health status labels to obtain updated filtered data, including: Based on the feature point association information, feature point landmark nodes are determined, and the factor graph model is constructed with the unmanned system pose node and the feature point landmark node as vertices. The pre-integration factor is calculated based on the inertial measurement unit data in the filtered data, the reprojection factor is calculated based on the visual feature points in the feature point association information, the lidar point cloud computing matching factor is calculated based on the filtered data, and the positioning factor is calculated based on the global navigation satellite system data in the filtered data. The pre-integration factor, the reprojection factor, the matching factor, and the positioning factor are used as factor edge constraints of the factor graph model; wherein, the initial information matrix of each factor edge is obtained by weighting the confidence labels of the corresponding sensors in the filtered data; The factor graph model is nonlinearly optimized to obtain the current optimal state estimate. The residual vector of each factor edge is calculated. The covariance matrix of the residual vector is determined based on the noise model of each sensor. The residual Mahalanobis distance of each factor edge is calculated based on the residual vector and the covariance matrix. The residual Mahalanobis distance of each factor edge is compared with a preset chi-square test threshold. Factor edges whose residual Mahalanobis distance exceeds the preset chi-square test threshold are marked as abnormal edges. For each factor edge corresponding to the same sensor in a consecutive preset number of frames, a time-series cumulative statistical analysis is performed. If the proportion of abnormal edges exceeds a preset proportion threshold, the sensor is determined to be faulty, and the health status label of fault isolation state is output. Otherwise, the health status label of normal state is output. Sensors with a health status label indicating a fault isolation state are designated as faulty sensors. Abnormal edges corresponding to these faulty sensors and the initial information matrix corresponding to these abnormal edges are removed. Based on the data of sensors with a health status label indicating a normal state in the filtered data, a prediction model is constructed using the historical valid measurement values of the faulty sensors. Compensation data for the faulty sensors is generated through this prediction model. The compensation data replaces the measurement values of the corresponding faulty sensors in the filtered data, resulting in the updated filtered data.
4. The method of claim 1, wherein, Based on the sensor health status labels and combined with the global environmental features, the fusion weights of each sensor are dynamically generated using rough set decision rules. The updated filtered data is then weighted and fused according to these fusion weights to obtain the optimal pose estimate. This process includes: Acquire fusion effect data under different sensor state combinations and different environment types, and construct a rough set decision table. The rows of the rough set decision table represent the combination of the sensor state combination and the environment type, and the columns represent the weight configuration of each sensor. The sensor health status label is used as a conditional attribute, and the global environmental features are used as an environmental type representation. They are input into the rough set decision table and matched using the identifiable matrix attribute reduction algorithm to obtain the fusion weight coefficients of each sensor corresponding to the current state. Based on the factor graph model, the initial information matrix of the corresponding factor edge is scaled using the fusion weight coefficients of each sensor to obtain a weighted factor graph model; the weighted factor graph model is then solved by nonlinear optimization to obtain the optimal pose estimate and the corresponding covariance matrix of the current state. The innovative Mahalanobis distance between the optimal pose estimate of the current state and the actual observations of each sensor in the updated filtered data is calculated. If the innovative Mahalanobis distance does not exceed a preset consistency threshold, the optimal pose estimate of the current state is output as the final optimal pose estimate. If the innovative Mahalanobis distance exceeds the preset consistency threshold, a fallback mechanism is triggered to reduce the confidence level of each sensor and to re-optimize the weighted factor graph model nonlinearly.
5. The method according to claim 1, characterized in that, The optimal pose estimation is used to correct motion distortion in the updated filtered LiDAR point cloud data, resulting in a corrected point cloud frame. This frame is then combined with the fused feature map for dynamic target removal, constructing a global inspection map, including: Using the optimal pose estimation, motion distortion correction is performed on the lidar point cloud in the updated filtered data. Each point in the lidar point cloud is transformed to a preset coordinate system according to the corresponding timestamp through pose interpolation to obtain the corrected point cloud frame. The fused feature map is input into the semantic segmentation network to obtain the semantic category label for each pixel. Motion consistency is checked on the point clouds in the corrected point cloud frame that correspond to the semantic category label as dynamic targets. The position changes of the point clouds with the semantic category label as dynamic targets in multiple consecutive frames are compared with the motion of the unmanned system itself. If the matching relationship is inconsistent, the point clouds with the semantic category label as dynamic targets are determined to be dynamic objects and removed from the corrected point cloud frame to obtain static point cloud frames. Based on the optimal pose estimation, the static point cloud frame is transformed to the preset coordinate system to obtain the transformed static point cloud frame. The transformed static point cloud frame is then fused and downsampled with the local map accumulated during the inspection process to obtain an updated local map. Using the fused feature map, loop closure detection is performed on the updated local map and the historical frame map. When a loop closure is detected, global pose map optimization is triggered to adjust the historical pose of the unmanned system. The updated local map and the historical frame map are then integrated to generate global map data. The semantic category label is superimposed on the global map data to construct the global inspection map.
6. The method according to claim 5, characterized in that, The candidate regions include smuggling vessel candidate regions and shore-based smuggler candidate regions; the process involves extracting candidate regions based on the global inspection map and the fused feature map, inputting the candidate regions and the fused feature map into a target detection network for target recognition to obtain the target location, combining the target information from consecutive frames associated with the optimal pose estimation to generate a target trajectory, analyzing target behavior based on the target trajectory and the global inspection map, and generating an abnormal event report, including: Based on the spatial geometric constraints and semantic category labels of the global inspection map, and combined with the fused feature map, the candidate regions of smuggling vessels are extracted in the sea area, and the candidate regions of shore-based smugglers are extracted in the shore area. The candidate regions of smuggling vessels, the candidate regions of shore-based smugglers, and the fused feature map are input together into the target detection network for target recognition to obtain the target location. The target position is transformed to the preset coordinate system by combining the optimal pose estimation, and the target trajectory is generated by associating the target positions of consecutive frames; The target trajectory is matched with the positional relationship between the restricted area boundary and the normal inspection area in the global inspection map, respectively, to analyze whether the target has abnormal behavior and generate the abnormal event report.
7. The method according to claim 4, characterized in that, The fusion weighting coefficients of each of the sensors are dynamically generated using the following formula: in, For the first Frame sensor The fusion weight coefficient satisfies ; For data frame sequence number; The condition attribute set in the rough set decision table; The set of decision attributes in the rough set decision table; It is an approximate accuracy function; For conditional attribute set For decision attribute set Approximate accuracy; For sensors The corresponding health status attribute is a subset of the conditional attributes. One of the elements; To eliminate sensors Corresponding health status attributes Approximate accuracy after; A collection of sensors; For sensor set The traversal variable; For traversal variables The corresponding sensor health status attributes; For the first Frame sensor Health status labels; For sensor-based Correction factor for health status; For the first Global environmental characteristic variables of the frame; The adaptation coefficient is based on the global environmental characteristics.
8. A multi-source information fusion system for an air-to-ground unmanned system to meet the needs of coastline inspection, characterized in that, The system includes: The spatiotemporal synchronization module is used to acquire raw data from multiple sources of sensors, perform spatiotemporal synchronization on the raw data from multiple sources of sensors to obtain a synchronized data frame; perform granular calculation processing on the synchronized data frame, divide the sensor data in the synchronized data frame into information granules, calculate the generalized information entropy of each information granule, compare the generalized information entropy with a preset dynamic threshold, and filter out high-confidence information granules based on the comparison result to generate filtered data with confidence labels; The deep feature fusion module is used to extract multimodal features from the filtered data. It extracts feature maps of visible light images, infrared images, and lidar point clouds from the filtered data through a convolutional neural network. It uses the confidence label-guided attention mechanism to perform weighted fusion of the feature maps to generate a fused feature map and corresponding feature point association information. It then performs global average pooling on the fused feature map to extract global environmental features. The fault-tolerant localization and weight optimization module is used to construct a factor graph model based on the feature point association information, use the filtered data as the factor edge constraints of the factor graph model, calculate the residual Mahalanobis distance of each factor edge in the factor graph model, perform fault detection on each sensor data in the filtered data based on the residual Mahalanobis distance, output the health status label of each sensor, compensate the faulty sensor data in the filtered data according to the health status label, and obtain updated filtered data; based on the sensor health status label and combined with the global environmental features, dynamically generate the fusion weight of each sensor through rough set decision rules, and perform weighted fusion estimation on the updated filtered data according to the fusion weight of each sensor to obtain the optimal pose estimation; The environmental perception and anomaly analysis module is used to perform motion distortion correction on the LiDAR point cloud in the updated filtered data using the optimal pose estimation, to obtain a corrected point cloud frame. It then combines this with the fused feature map to perform dynamic target removal, construct a global inspection map, extract candidate regions based on the global inspection map and the fused feature map, input the candidate regions and the fused feature map into a target detection network for target recognition, obtain the target location, combine the target information from consecutive frames associated with the optimal pose estimation to generate a target trajectory, analyze target behavior based on the target trajectory and the global inspection map, and generate an anomaly event report.
9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.