A multimodal sensor orientation measurement fusion system
By using hardware synchronization and an improved multimodal sensor fusion system, the problems of insufficient time synchronization and abnormal state detection in multimodal sensor orientation measurement are solved, achieving high-precision and robust sensor data fusion and improving the system's reliability and practicality in complex environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA UNIV OF PETROLEUM (EAST CHINA)
- Filing Date
- 2025-12-31
- Publication Date
- 2026-06-30
Smart Images

Figure CN121521101B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of sensor measurement technology, specifically to a multimodal sensor orientation measurement fusion system. Background Technology
[0002] Multimodal sensing and orientation measurement technology, as a core component of modern intelligent sensing and autonomous systems, is widely used in key scenarios such as robot navigation, augmented reality, autonomous driving, and industrial automation. This technology aims to achieve accurate and robust estimation of the position and orientation of a target or itself in three-dimensional space by fusing sensor data from different physical principles.
[0003] Among them, the multimodal sensor orientation measurement fusion system is the key technical path to achieve the above objectives. Its core lies in the comprehensive utilization of various sensing information such as inertial measurement units, visual sensors, global positioning systems and lidar, and the improvement of the accuracy, reliability and environmental adaptability of orientation measurement through data fusion algorithms.
[0004] Existing technologies typically employ loosely or tightly coupled fusion frameworks; however, these methods face significant challenges when processing multi-source heterogeneous sensor data. Differences in sampling frequencies, data formats, and noise characteristics among different sensors lead to insufficient time synchronization and spatial calibration accuracy. In dynamic or feature-deficient environments, individual sensors are prone to failure or performance degradation, while existing fusion strategies have limited ability to detect and adaptively compensate for sensor anomalies. Furthermore, traditional Kalman filtering or optimization methods often suffer from high computational complexity, poor real-time performance, and divergent fusion results when dealing with nonlinearity, non-Gaussian noise, and large-scale real-time data processing. These shortcomings severely restrict the reliability, accuracy, and practicality of multimodal orientation measurement systems in complex application scenarios. Therefore, there is an urgent need for a multimodal sensor orientation measurement fusion system that can achieve high efficiency, robustness, and adaptability. Summary of the Invention
[0005] The technical problem to be solved by the present invention is to overcome the shortcomings of the existing multimodal sensor orientation measurement fusion system, such as insufficient time synchronization accuracy, limited ability to detect and compensate for abnormal sensor states, and performance degradation of fusion algorithm under nonlinear non-Gaussian noise environment, and to provide a multimodal sensor orientation measurement fusion system.
[0006] The technical solution of this invention is as follows: A multimodal sensor orientation measurement fusion system, comprising a sensor array module, a spatiotemporal alignment preprocessing module, a multimodal feature extraction and encoding module, an adaptive weighted fusion decision module, and a system output and feedback control module. The sensor array module consists of an inertial measurement unit, a stereo vision camera, a global navigation satellite system receiver, and a 2D lidar. Each sensor achieves microsecond-level synchronous sampling using a hardware trigger signal. The spatiotemporal alignment preprocessing module receives the raw data stream from the sensor array module. First, it unifies the timestamps of each sensor using a phase-locked loop-based clock drift compensation algorithm. Then, it transforms the observation data from each sensor to a unified carrier coordinate system using a pre-calibrated inter-sensor extrinsic parameter matrix. The multimodal feature extraction and encoding module processes the spatiotemporally aligned data in parallel. Inertial measurement unit (IMU) data is solved using quaternion complementary filtering to obtain the initial attitude value of the carrier. Stereo vision camera images are used to generate 3D point clouds through an improved ORB feature extraction and stereo matching algorithm. Global navigation satellite system (GNSS) receiver signals are processed by carrier phase smoothing pseudorange to output position information. 2D LiDAR scan data is used for inter-frame matching via an iterative nearest-point algorithm to estimate displacement. The adaptive weighted fusion decision module receives multi-path feature vectors output by the multimodal feature extraction and encoding module. This module integrates a sliding window-based sensor health assessment submodule. This submodule scores the health of each sensor based on the confidence index and historical consistency of the output data, and the health score serves as the basis for dynamic adjustment of subsequent fusion weights. The fusion core employs an improved capacitive Kalman filter framework, which extends the traditional state prediction and update steps into a multi-model interaction structure. Each model corresponds to the observation likelihood function of a sensor combination, and the activation probability of each model is adjusted in real-time by the output of the sensor health assessment submodule. Finally, the optimal attitude and position estimates are output through a weighted summation method. The system output and feedback control module uses the optimal estimate generated by the adaptive weighted fusion decision module as the final output of the system. At the same time, the module also has a closed-loop correction mechanism, which back-projects the current optimal estimate to the observation space of each sensor, calculates the residuals and uses them to update the sensor calibration parameters and the noise covariance matrix of the fusion filter online.
[0007] Furthermore, the inertial measurement unit in the sensor array module includes a 3-axis MEMS gyroscope and a 3-axis MEMS accelerometer, with a data output frequency of 200 Hz; the stereo vision camera uses a global shutter CMIS sensor with a resolution of 1280 x 720 pixels and a frame rate of 30 Hz; the global navigation satellite system receiver supports both BeiDou and GPS systems, with a positioning data update rate of 1 Hz; the 2D lidar has a scanning frequency of 10 Hz and an angular resolution of 0.5 degrees. All sensors are connected to a central synchronization signal generator via a dedicated synchronization signal line. This generator produces synchronization pulses at a frequency of 1000 Hz, ensuring that all sensor data acquisition times are aligned within a 1-millisecond error range.
[0008] Furthermore, the specific implementation process of the phase-locked loop-based clock drift compensation algorithm in the spatiotemporal alignment preprocessing module is as follows: using the 1 pulse per second signal from the global navigation satellite system receiver as the reference clock, the phase difference between the internal clocks of other sensors and this reference is detected; a proportional-integral controller dynamically adjusts the interpolation time of each sensor's data, ensuring that all sensor data streams achieve timestamp synchronization. The extrinsic parameter matrix between sensors is obtained through offline hand-eye calibration. The calibration process utilizes a high-precision checkerboard calibration board, which is simultaneously observed by a stereo vision camera and a 2D lidar, and the optimal rotation matrix and translation vector are solved through singular value decomposition.
[0009] Furthermore, the improved ORB feature extraction algorithm in the multimodal feature extraction and encoding module introduces an adaptive threshold adjustment mechanism based on the traditional ORB. This mechanism dynamically calculates the FAST corner threshold for feature point extraction according to the overall grayscale distribution of the image, ensuring that a stable number of feature points can still be extracted under varying lighting conditions. The stereo matching algorithm adopts a semi-global matching method, obtaining a disparity map through multi-path cost aggregation, and removing mismatched points through left-right consistency checks.
[0010] Furthermore, the sensor health assessment submodule based on a sliding window in the adaptive weighted fusion decision module maintains a sliding window with a length of 50 data points. For each sensor's data within the window, the Mahalanobis distance between it and the current optimal estimate of the system is calculated, and this distance is mapped to a health score between 0 and 1. Sensors with a health score below 0.3 are judged as abnormal, and their fusion weight at that moment is automatically set to 0. The improved capacitive Kalman filter framework adopts the posterior probability density function of the third-order spherical radial volume rule approximation state. Its multi-model interaction structure contains four parallel-running filter models, corresponding to sensor combinations using only an inertial measurement unit, an inertial measurement unit plus a stereo vision camera, an inertial measurement unit plus a global navigation satellite system, and an inertial measurement unit plus a 2D lidar, respectively. The observation likelihood function of each model is modeled as a Gaussian distribution based on the noise characteristics of the corresponding sensor, and its mean and covariance are learned through historical data. The activation probability of each model is determined by normalizing the sensor health score using a softmax function.
[0011] Furthermore, the specific workflow of the closed-loop correction mechanism of the system output and feedback control module is as follows: combine the current optimal attitude and position estimate of the system with the known sensor model to predict the observation values that each sensor should have at this moment; calculate the residual between the predicted observation values and the actual sensor observation values; if the moving average of the residual sequence continues to exceed the preset threshold, the calibration parameter online update process is initiated. This process uses the Levenberg-Marquardt optimization algorithm to minimize the sum of squared residuals, thereby iteratively optimizing the sensor extrinsic parameter matrix and intrinsic parameter distortion coefficient.
[0012] Furthermore, the entire system runs on an embedded computing platform that integrates a multi-core ARM processor and an FPGA programmable logic unit. Sensor data acquisition and spatiotemporal alignment preprocessing tasks are implemented on the FPGA to ensure real-time performance, while multimodal feature extraction and adaptive fusion algorithms run on the ARM processor. The overall system power consumption is controlled within 5 watts, the attitude measurement accuracy reaches 0.1 degrees, and the position measurement accuracy reaches 0.1 meters.
[0013] Compared with the prior art, the beneficial effects achieved by the present invention are:
[0014] 1. This invention systematically solves the core challenges in multimodal sensor orientation measurement by constructing a complete technical chain that includes precise hardware synchronization, high-precision spatiotemporal alignment, robust feature extraction, adaptive weighted fusion based on health assessment, and closed-loop feedback correction. The hardware synchronization design of the sensor array module ensures data temporal consistency from the source, significantly reducing fusion errors introduced by asynchronous sampling. The phase-locked loop clock compensation and precise extrinsic parameter calibration of the spatiotemporal alignment preprocessing module lay an accurate spatiotemporal reference for subsequent fusion.
[0015] 2. The adaptive feature extraction algorithm employed in the multimodal feature extraction and encoding module enhances the system's perception robustness under challenging environments such as varying illumination and missing textures. The adaptive weighted fusion decision module innovatively combines sensor health assessment with multi-model capacitive Kalman filtering, enabling rapid detection and soft isolation of abnormal sensors, preventing faulty sensors from contaminating the fusion results. Simultaneously, the improved filtering framework effectively handles nonlinear non-Gaussian noise, improving the accuracy and stability of state estimation. The closed-loop correction mechanism of the system output and feedback control module endows the system with online self-calibration capabilities, continuously compensating for sensor drift and installation errors, ensuring that measurement accuracy does not decay over long-term operation. Overall, the system significantly outperforms existing technologies in terms of accuracy, robustness, adaptability, and real-time performance. Attached Figure Description
[0016] Figure 1 This is a schematic diagram of the overall technical solution architecture of the multimodal sensor orientation measurement system proposed in this invention;
[0017] Figure 2 This is a schematic diagram of the core principle framework of the adaptive weighted fusion decision module in this invention;
[0018] Figure 3 This is a logical flowchart of the spatiotemporal alignment preprocessing module in this invention;
[0019] Figure 4 This is a schematic diagram of the multi-level interaction relationship and data flow of the multimodal feature extraction and encoding module in this invention;
[0020] Figure 5 This is a schematic diagram illustrating the principle framework of the closed-loop correction mechanism of the system output and feedback control module in this invention. Detailed Implementation
[0021] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0022] Example 1
[0023] Please refer to the attached document. Figure 1This embodiment details the specific hardware configuration and software execution flow of a multimodal sensor orientation measurement fusion system. The system is physically deployed on a highly integrated embedded computing platform, the core of which includes a multi-core ARM architecture processor and a field-programmable gate array (FPGA). The FPGA is responsible for executing all low-level data acquisition and preprocessing tasks with high real-time requirements, and its parallel processing capability ensures that the system's response latency to sensor data is less than 1 millisecond. The multi-core ARM processor handles complex upper-level algorithm calculations, including multimodal feature extraction, adaptive fusion decision-making, and system feedback control. The power supply design of the entire system has been precisely optimized, with total power consumption strictly controlled to within 5 watts during full-load operation, making it particularly suitable for power-sensitive mobile or long-term field monitoring scenarios.
[0024] The system's basic data comes from the sensor array module. This module is a multi-sensor assembly, specifically containing four sensing units with different functions: an inertial measurement unit (IMU), a stereo vision camera, a global navigation satellite system (GNSS) receiver, and a 2D lidar. The IMU integrates a 3-axis MEMS gyroscope and a 3-axis MEMS accelerometer, with a fixed data output frequency of 200 Hz, continuously providing high-frequency measurements of the carrier's angular velocity and linear acceleration. The stereo vision camera uses a global shutter complementary metal-oxide-semiconductor (CMOS) sensor with an image resolution of 1280 x 720 pixels, capturing stereo image pairs of the scene at a fixed frame rate of 30 Hz. The GNSS receiver is designed to simultaneously support both the BeiDou Navigation Satellite System and the Global Positioning System (GPS). It calculates the carrier's absolute position information by receiving signals from multiple satellites, with a positioning data update rate of 1 Hz. The 2D lidar scans the surrounding environment by emitting laser beams and receiving reflected signals. Its scanning frequency is 10 Hz, and it can acquire 720 distance measurement points per scan cycle, corresponding to an angular resolution of 0.5 degrees. To achieve high consistency of data from all sensors in the time dimension, the system employs a hardware-based synchronization mechanism. A central synchronization signal generator is deployed as the core timing control unit, stably generating high-precision synchronization pulses at a frequency of 1000 Hz. Each synchronization pulse is simultaneously sent to the four sensors via a dedicated synchronization signal line, forcing them to acquire data within the same microsecond-level time window. This design ensures that data samples from different sensors have a traceable correspondence in timestamps, minimizing fusion errors caused by asynchronous sampling times.
[0025] The raw data stream acquired by the sensor array module is first sent to the spatiotemporal alignment preprocessing module for processing. Please refer to the appendix. Figure 3The core mission of this module is to address the inherent time asynchrony and spatial coordinate system inconsistencies in multi-sensor data. The time alignment process is achieved through a clock drift compensation algorithm based on a phase-locked loop (PLL) principle. This algorithm establishes the 1-pulse-per-second signal output from the Global Navigation Satellite System (GNSS) receiver as the sole time reference for the entire system. The algorithm continuously monitors the minute deviations between the internal clock phases of the inertial measurement unit (IMU), stereo camera, and 2D LiDAR and this reference clock. A digital proportional-integral (PI) controller is embedded in the algorithm, dynamically calculating the compensation amount based on the detected phase difference and adjusting the timing of resampling or interpolation of other sensor data streams in real time. Through this closed-loop control, the timestamps of all sensor data are ultimately unified to the GNSS timeline, eliminating the accumulated time error caused by crystal oscillator frequency drift. After time alignment, the data enters the spatial coordinate unification stage. Before the system is put into use, a precise offline hand-eye calibration process must be performed. During this process, a high-precision checkerboard calibration board of known dimensions is simultaneously placed in the fields of view of the stereo camera and the 2D LiDAR. By acquiring multiple sets of images and laser point cloud data from the calibration board at different poses, the optimal rotation matrix and translation vector of the stereo vision camera relative to the carrier coordinate system, and the optimal rotation matrix and translation vector of the 2D LiDAR relative to the carrier coordinate system, are calculated using the singular value decomposition algorithm. These rotation matrices and translation vectors together constitute the inter-sensor extrinsic parameter matrix. During online system operation, the spatiotemporal alignment preprocessing module uses these pre-calibrated extrinsic parameter matrices to transform all data points observed by the stereo vision camera and 2D LiDAR into a unified carrier coordinate system. For the inertial measurement unit and the global navigation satellite system receiver, their data are usually already expressed in the carrier coordinate system, or can be aligned through simple transformation. The data processed by this module is not only synchronized in time but also within the same reference frame in space, laying a solid foundation for subsequent feature extraction and fusion.
[0026] The standardized data, after spatiotemporal alignment, was then fed in parallel into the multimodal feature extraction and encoding module. Please refer to the appendix. Figure 4This module employs a specially optimized algorithm to extract feature information for orientation estimation from different types of sensor data, forming a multi-dimensional feature vector. For the inertial measurement unit (IMU) data stream, the core processing method is a quaternion-based complementary filtering algorithm. This algorithm simultaneously receives angular velocity output from the gyroscope and specific force information output from the accelerometer. Angular velocity is directly used to calculate attitude change through integration, but this introduces accumulated errors; the accelerometer can provide an absolute attitude reference by measuring the direction of gravity, but it is unreliable under dynamic acceleration. The complementary filter combines the advantages of both through an adjustable filtering coefficient: trusting the gyroscope in the high-frequency range and trusting the accelerometer in the low-frequency range. Finally, a quaternion representing the three-dimensional attitude of the carrier is calculated in real time as a preliminary estimate of the attitude. The output frequency of this process is consistent with the IMU sampling rate, which is 200 Hz.
[0027] For stereo vision camera data, the processing flow is more complex, aiming to recover depth information from the images. First, an improved ORB feature extraction method is performed on the two acquired left and right images. Traditional ORB's FAST corner detection uses a fixed threshold. This invention dynamically adjusts the threshold through grayscale distribution, specifically implemented as follows: Step 1: Image grayscale distribution analysis, calculating the grayscale histogram of the current frame image. Determine the effective dynamic range:
[0028]
[0029] in For the total number of pixels, step 2: adaptive threshold calculation, dynamic threshold. Generate using the following formula:
[0030] To avoid abrupt changes in the number of feature points, a rate of change limit is set:
[0031]
[0032] Traditional ORB feature extraction uses a fixed FAST corner detection threshold, which is unstable under drastic lighting conditions. This system employs an improved algorithm that introduces an adaptive threshold adjustment mechanism. This mechanism analyzes the overall grayscale histogram distribution of the current image in real time and dynamically calculates an optimal FAST corner threshold based on the image's contrast and brightness levels. For example, in well-lit scenes, the threshold automatically increases to reduce noise point extraction; in dim environments, the threshold decreases to ensure a sufficient number of feature points are extracted. This adaptability guarantees a stable number of high-quality feature points under different lighting conditions. After feature point extraction, the system executes a stereo matching algorithm, employing a semi-global matching method. This method effectively overcomes the ambiguity of local matching by aggregating matching costs along multiple one-dimensional paths, generating a dense or semi-dense disparity map. To further eliminate mismatches, the algorithm also performs a left-right consistency check, checking whether a pixel in the left image, after being matched in the right image, still returns to its original position when matched back in the left image. Points that fail the check are considered mismatches and discarded. Finally, based on camera intrinsic parameters and baseline distance, the disparity map is converted into a 3D point cloud in the carrier coordinate system. This 3D point cloud provides spatial distribution information of feature points in the environment.
[0033] For Global Navigation Satellite System (GNSS) receiver data, the focus of processing is improving positioning accuracy. The system employs carrier phase smoothing pseudorange technology. Pseudorange measurements have relatively high noise but no ambiguity, while carrier phase measurements have high accuracy but contain integer ambiguity. This technology uses the variation of high-precision carrier phase observations to smooth the noisy pseudorange observations. Through a moving average filter, high-frequency noise in pseudorange measurements is effectively suppressed, resulting in smoother and more accurate position coordinate information with an update rate of 1 Hz.
[0034] For 2D LiDAR data, the processing objective is to infer the displacement of the carrier by comparing consecutive scan frames. The system employs an iterative nearest-point algorithm. The algorithm acquires the point clouds of the current frame and the previous frame, and through iterative calculations, finds an optimal rigid body transformation (including rotation and translation) that minimizes the overall distance between the current and previous frame point clouds. This optimal transformation represents the relative motion of the carrier within the two-frame time interval, thereby calculating the displacement vector and angular changes, with an output frequency of 10 Hz.
[0035] The multimodal feature extraction and encoding module finally normalizes and encodes the initial pose values, 3D point clouds, smoothed positions, and displacement vectors output from the four parallel processing channels to form a set of time-aligned, coordinate-unified multi-path feature vectors, which are then sent to the next core module.
[0036] The core decision-making unit of the system is the adaptive weighted fusion decision-making module. Please refer to the appendix. Figure 2This module receives multiple feature vectors from upstream sources, and its internal operation comprises two key sub-parts: sensor health assessment and improved fusion filtering. First, a sliding window-based sensor health assessment sub-module begins operation. This sub-module maintains a fixed-length FIFO queue of 50 data points as a sliding window for each sensor inertial measurement unit, stereo camera, GPS receiver, and 2D LiDAR. For each newly entered sensor data point within the window, the sub-module calculates its Mahalanobis distance to the current optimal system state estimate. The Mahalanobis distance considers the covariance structure of the data and more accurately measures the degree of anomaly of an observation relative to the estimated distribution. The calculated Mahalanobis distance is then converted into a health score between 0 and 1 using a preset mapping function. The closer the health score is to 1, the more reliable the sensor data; the closer it is to 0, the less reliable it is. The system sets a strict anomaly threshold of 0.3. When the real-time health score of any sensor is lower than 0.3, the fusion decision module will immediately force the fusion weight of that sensor to be set to 0 at the current moment, thereby achieving soft isolation of the abnormal sensor and preventing its erroneous data from polluting the fusion result.
[0037] Following this, an improved capacitive Kalman filter framework was implemented. This framework is not a single filter, but a multi-model interactive structure comprising four parallel-running filter models. Each model represents a specific sensor combination assumption: Model 1 uses only inertial measurement unit (IMU) data; Model 2 uses combined IMU and stereo vision camera data; Model 3 uses combined IMU and GNS receiver data; and Model 4 uses combined IMU and 2D LiDAR data. Each model independently performs standard capacitive Kalman filtering steps, including state prediction and state update. The capacitive Kalman filter uses a third-order spherical radial volume rule to approximate the posterior probability density function of the state in a nonlinear system, offering higher estimation accuracy and numerical stability compared to the extended Kalman filter when handling highly nonlinear problems.
[0038] Each model has its own observation likelihood function, modeled as a Gaussian distribution based on the noise characteristics of its corresponding sensor. The mean and covariance matrix of the Gaussian distribution are not fixed but learned online by analyzing sensor data within a historical window, allowing the model to adapt to slow changes in sensor noise characteristics. Crucially, the four models do not contribute equally. Their activation probabilities, or their weights in the final fusion result, are dynamically determined by the health scores calculated by the aforementioned sensor health assessment submodule. Specifically, the health scores of the four sensors are combined into a vector and then normalized using a softmax function. The softmax function ensures that the sum of the weights of all models is 1, and models involving sensors with higher health scores receive higher activation probabilities. Finally, the output of the adaptive weighted fusion decision module is a weighted sum of the state estimates of the four models, with the weights being their respective real-time activation probabilities. This design allows the system to automatically favor sensor combinations with good current health, thus outputting optimal attitude and position estimates in various complex environments.
[0039] The system output and feedback control module serves as the terminal of the entire information processing chain. This module first outputs the optimal attitude and position estimates generated by the adaptive weighted fusion decision module as the system's final result. The system's performance specifications stipulate an attitude measurement accuracy of 0.1 degrees and a position measurement accuracy of 0.1 meters. Furthermore, this module integrates a crucial closed-loop correction mechanism. Please refer to the appendix. Figure 5 The mechanism works as follows: First, it combines the optimal attitude and position estimates output by the system at the current moment with a pre-established precise sensor observation model to predict the theoretically expected observation values of each sensor at this moment. Next, it calculates the differences between these predicted observation values and the actual observation values transmitted by each sensor, i.e., the residuals. The system continuously monitors these residual sequences and calculates their average value within a sliding time window. If the sliding average residual value of a sensor consistently exceeds its preset threshold, it indicates a systematic deviation between the sensor's observations and the system's optimal estimate, possibly stemming from sensor drift or minor changes in installation parameters. At this point, the closed-loop correction mechanism is triggered, initiating the online calibration parameter update process. This process employs the Levenberg-Marquardt optimization algorithm, a numerical optimization method widely used in nonlinear least squares problems. The algorithm uses the current sensor calibration parameters, including the extrinsic matrix and intrinsic distortion coefficients, as initial values, and minimizes the sum of squares of the residuals from all sensor observations as the objective function, performing iterative optimization to solve the problem. Through multiple iterations, the algorithm can gradually adjust the calibration parameters to minimize the difference between the predicted and actual observations, thereby achieving online self-calibration and ensuring that the system can maintain high-precision measurements even after long-term operation.
[0040] The entire system, from data acquisition, preprocessing, feature extraction, fusion decision-making to output feedback, forms a complete, automated, robust, and adaptive closed-loop system for azimuth measurement. Its synchronous hardware design, precise spatiotemporal alignment, robust feature extraction, adaptive fusion based on real-time health assessment, and online feedback correction mechanism collectively ensure that it can still provide stable and reliable high-precision azimuth information even in complex, dynamic, and even non-ideal environments where some sensors fail.
[0041] Example 2
[0042] This embodiment focuses on illustrating an alternative to a software-implemented spatiotemporal alignment and feature extraction preprocessing workflow in a multimodal sensor orientation measurement system, when the embedded computing platform's field-programmable gate array (FPGA) resources are limited or the system needs to handle higher frequency sensor data streams. In Embodiment 1, sensor data acquisition and spatiotemporal alignment preprocessing tasks were assigned to the FPGA for execution to achieve ultimate real-time performance. In this embodiment, these tasks are migrated to a multi-core ARM processor, utilizing its powerful parallel computing cores and optimized real-time operating system scheduling strategies.
[0043] The hardware composition and synchronization mechanism of the sensor array module are exactly the same as in Example 1, still achieving microsecond-level hardware synchronous sampling through a central synchronization signal generator and a dedicated synchronization signal line. The difference is that the raw data streams generated by each sensor no longer directly enter the field-programmable gate array, but are directly transmitted to the memory buffer of the multi-core ARM processor through a high-speed serial peripheral interface or Ethernet interface.
[0044] The spatiotemporal alignment preprocessing function is now handled by a high-priority real-time thread running on a dedicated core of the ARM processor. This thread also executes a phase-locked loop-based clock drift compensation algorithm. The algorithm uses a 1-pulse-per-second signal from a Global Navigation Satellite System (GNSS) receiver as a reference, but the detection of the clock phase difference and the calculation of the compensation amount are entirely implemented in software. The thread maintains a high-precision timer and accurately captures the arrival timestamps of each sensor data packet through polling or interrupts. The proportional-integral (PI) controller, as part of the software algorithm, dynamically determines when to perform linear interpolation or spline interpolation resampling on subsequent sensor data packets based on the calculated phase difference. Although the synchronization accuracy of the software implementation may be slightly lower than that of a dedicated hardware field-programmable gate array (FPGA), through carefully optimized code and real-time scheduling, the timestamp synchronization error can still be controlled to the sub-millisecond level, meeting the needs of most application scenarios. The spatial coordinate transformation part, which uses a pre-calibrated extrinsic parameter matrix to transform the data to a unified carrier coordinate system, is also completed on the ARM processor using an efficient matrix operation library.
[0045] The tasks of the multimodal feature extraction and encoding modules are also distributed across multiple computing cores of the ARM processor for parallel execution. To cope with processor load, the complexity of some algorithms can be dynamically adjusted. For example, in the improved ORB feature extraction stage of the stereo vision camera, a feature point count target control mechanism can be introduced. When the processor load is high, the adaptive threshold adjustment mechanism tends to set a higher FAST corner threshold, thereby limiting the total number of extracted feature points and ensuring real-time system response. Conversely, when the load is low, the threshold is lowered to extract richer feature information. The stereo matching algorithm can also make trade-offs in the number of aggregation paths in semi-global matching as needed to reduce computational load. Quaternion complementary filtering for inertial measurement units, carrier phase smoothing pseudorange for global navigation satellite system receivers, and iterative nearest-point algorithm for 2D LiDAR are all implemented as independent software modules and distributed to run on different processor cores. Data exchange is achieved through shared memory or message passing mechanisms to ensure parallel processing efficiency.
[0046] The software implementation of the adaptive weighted fusion decision module and the system output and feedback control module is completely consistent with that described in Example 1, and they run on other cores of the ARM processor. All algorithmic logic for sensor health assessment, improved capacitive Kalman filtering multi-model interaction, and closed-loop correction mechanisms is implemented in software code. This fully software-based architecture reduces reliance on dedicated hardware field-programmable gate arrays (FPGAs), improves system flexibility and reconfigurability, and facilitates the introduction of new algorithms or parameter adjustments through software upgrades. While its ultimate real-time performance may not match that of hardware-accelerated solutions, this embodiment provides an efficient and reliable implementation for applications with lower data update rate requirements or stricter cost controls. The entire system still strictly follows the described chain of synchronization, alignment, feature extraction, adaptive fusion, and feedback correction techniques to ensure the accuracy and robustness of the azimuth measurement output.
[0047] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A multimodal sensor orientation measurement fusion system, characterized in that, include: The sensor array module consists of an inertial measurement unit, a stereo vision camera, a global navigation satellite system receiver, and a 2D lidar. Each sensor achieves microsecond-level synchronous sampling through hardware trigger signals. The spatiotemporal alignment preprocessing module receives the raw data stream from the sensor array module, unifies the timestamps of each sensor through a phase-locked loop-based clock drift compensation algorithm, and transforms the observation data of each sensor into a unified carrier coordinate system using a pre-calibrated inter-sensor extrinsic parameter matrix. The multimodal feature extraction and encoding module performs parallel processing on the spatiotemporally aligned data. The inertial measurement unit data is solved by quaternion complementary filtering to obtain the initial value of the carrier attitude. The stereo vision camera image generates a 3D point cloud through an improved ORB feature extraction and stereo matching algorithm. The global navigation satellite system receiver signal outputs position information after carrier phase smoothing pseudorange processing. The 2D lidar scanning data is matched between frames through an iterative nearest point algorithm to calculate the displacement. The adaptive weighted fusion decision module receives multi-path feature vectors output by the multimodal feature extraction and encoding module. This module integrates a sensor health assessment submodule based on a sliding window. This submodule scores the health based on the confidence index and historical consistency of the output data of each sensor. The fusion core adopts an improved capacitive Kalman filter framework, which extends the traditional state prediction and update steps into a multi-model interaction structure. Each model corresponds to the observation likelihood function of a sensor combination. The activation probability of each model is adjusted in real time by the output of the sensor health assessment submodule. Finally, the optimal attitude and position estimate is output through a weighted summation method. The system output and feedback control module takes the optimal estimate generated by the adaptive weighted fusion decision module as the final output of the system. At the same time, this module has a closed-loop correction mechanism, which back-projects the current optimal estimate to the observation space of each sensor, calculates the residuals and uses them to update the sensor calibration parameters and the noise covariance matrix of the fusion filter online.
2. The multimodal sensor orientation measurement fusion system according to claim 1, characterized in that, The inertial measurement unit in the sensor array module includes a 3-axis MEMS gyroscope and a 3-axis MEMS accelerometer, with a data output frequency of 200 Hz; the stereo vision camera uses a global shutter CMIS sensor with a resolution of 1280 x 720 pixels and a frame rate of 30 Hz; the global navigation satellite system receiver supports both BeiDou and GPS systems, with a positioning data update rate of 1 Hz; and the 2D lidar has a scanning frequency of 10 Hz and an angular resolution of 0.5 degrees.
3. The multimodal sensor orientation measurement fusion system according to claim 2, characterized in that, In the sensor array module, each sensor is connected to a central synchronization signal generator via a dedicated synchronization signal line. This generator produces synchronization pulses at a frequency of 1000 Hz to ensure that the data acquisition times of all sensors are aligned within a 1-millisecond error range.
4. The multimodal sensor orientation measurement fusion system according to claim 1, characterized in that, The phase-locked loop-based clock drift compensation algorithm in the spatiotemporal alignment preprocessing module uses the 1 pulse per second signal from the global navigation satellite system receiver as the reference clock to detect the phase difference between the internal clocks of other sensors and this reference. By dynamically adjusting the interpolation time of each sensor's data using a proportional-integral controller, the data streams of all sensors are synchronized with the timestamp.
5. The multimodal sensor orientation measurement fusion system according to claim 4, characterized in that, In the spatiotemporal alignment preprocessing module, the extrinsic parameter matrix between sensors is obtained through offline hand-eye calibration. The calibration process utilizes a high-precision checkerboard calibration board that is simultaneously observed by a stereo vision camera and a 2D lidar, and the optimal rotation matrix and translation vector are solved through singular value decomposition.
6. The multimodal sensor orientation measurement fusion system according to claim 1, characterized in that, The improved ORB feature extraction algorithm in the multimodal feature extraction and encoding module introduces an adaptive threshold adjustment mechanism on the basis of traditional ORB. This mechanism dynamically calculates the FAST corner threshold for feature point extraction based on the overall grayscale distribution of the image. The stereo matching algorithm adopts a semi-global matching method, obtains a disparity map through multi-path cost aggregation, and removes mismatched points through left-right consistency checks.
7. The multimodal sensor orientation measurement fusion system according to claim 1, characterized in that, The sensor health assessment submodule based on a sliding window in the adaptive weighted fusion decision module maintains a sliding window with a length of 50 data points. For the data of each sensor in the window, the Mahalanobis distance between it and the current optimal estimate of the system is calculated, and the distance is mapped to a health score between 0 and 1. Sensors with a health score lower than 0.3 will be judged as abnormal, and their fusion weight at that moment will be automatically set to 0.
8. The multimodal sensor orientation measurement fusion system according to claim 7, characterized in that, The improved capacitive Kalman filter framework in the adaptive weighted fusion decision module adopts the posterior probability density function of the third-order spherical radial volume rule approximation state. Its multi-model interaction structure contains four parallel-running filter models, corresponding to sensor combinations using only an inertial measurement unit, an inertial measurement unit plus a stereo vision camera, an inertial measurement unit plus a global navigation satellite system, and an inertial measurement unit plus a 2D lidar.
9. A multimodal sensor orientation measurement fusion system according to claim 8, characterized in that, In the adaptive weighted fusion decision module, the observation likelihood function of each model is modeled as a Gaussian distribution based on the noise characteristics of the corresponding sensor, and its mean and covariance are learned through historical data; the activation probability of each model is determined by normalizing the sensor health score using the softmax function.
10. A multimodal sensor orientation measurement fusion system according to claim 1, characterized in that, The closed-loop correction mechanism of the system output and feedback control module combines the current optimal attitude and position estimate of the system with the known sensor model to predict the observation values that each sensor should have; calculates the residual between the predicted observation values and the actual sensor observation values; if the moving average of the residual sequence continues to exceed the preset threshold, the calibration parameter online update process is initiated. This process uses the Levenburg Marquardt optimization algorithm to minimize the sum of squared residuals, thereby iteratively optimizing the sensor extrinsic parameter matrix and intrinsic parameter distortion coefficient.