A pipeline robot positioning method, electronic device, computer readable storage medium and program product
By combining filtering and deep learning models and using an LSTM network with multi-head attention mechanism to process sensor data, the positioning accuracy problem of pipeline robots in environments without global positioning signals was solved, achieving higher positioning accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHONGQING SPECIAL EQUIP TESTING & RES INST (CHONGQING SPECIAL EQUIP ACCIDENT EMERGENCY INVESTIGATION & PROCESSING CENT)
- Filing Date
- 2026-05-15
- Publication Date
- 2026-06-16
AI Technical Summary
In scenarios with no global positioning signal and limited environment, the positioning accuracy of pipeline robots decreases, especially in urban underground drainage networks where the deterioration of sensor data quality leads to a significant deterioration in estimation performance.
By combining filtering and deep learning models, an improved robust extended Kalman filter is used for initial position estimation, and an LSTM network with multi-head attention mechanism is used to process sensor data to generate deep temporal features, ultimately outputting the localization position of the pipeline robot.
It improves the positioning accuracy of pipeline robots, suppresses prediction jump problems, enhances the ability to capture global dependencies, and improves the generalization performance of the network.
Smart Images

Figure CN122217331A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of positioning and state estimation technology, and more specifically, to a pipeline robot positioning method, electronic device, computer-readable storage medium, and program product. Background Technology
[0002] Localization is a core foundational technology for mobile robots performing autonomous navigation and tasks. Common robot localization methods, such as those based on the Global Positioning System (GPS), Ultra-Wideband (UWB), and radar, require real-time acquisition of satellite, base station, and environmental information. However, in certain scenarios, such as confined environments like urban underground drainage networks, global information is blocked, base stations are difficult to install, and the environment is often narrow and dimly lit, rendering these commonly used methods ineffective. Therefore, there is an urgent need to develop a short-to-medium range effective relative localization method. To address these scenarios, robots are typically equipped with multi-source sensors, such as wheel encoders and inertial measurement units (IMUs), using sensor fusion to leverage their complementary advantages for state estimation. Among these, wheel encoders and inertial sensors play a crucial role in relative localization because they require no external signals, do not interact with the environment, are inexpensive, and consume relatively few computational resources.
[0003] In pipeline scenarios, degraded sensor data quality often leads to a significant deterioration in estimation performance, especially when global correction information is lacking. For example, in underground pipeline scenarios, obstacles such as stones and silt accumulated after heavy rains and floods can cause sensor data anomalies, resulting in reduced positioning accuracy. Summary of the Invention
[0004] In view of this, the purpose of this application is to provide a pipeline robot positioning method, electronic device, computer-readable storage medium and program product, which can improve the problem of reduced positioning accuracy in scenarios with no global positioning signal and limited environment.
[0005] To achieve the above technical objectives, the technical solution adopted in this application is as follows:
[0006] In a first aspect, embodiments of this application provide a pipeline robot positioning method, the method comprising:
[0007] Acquire actual sensor data during the operation of the pipeline robot;
[0008] The actual sensor data is input into the filtering model, and the initial estimated position of the pipeline robot is obtained through the filtering model.
[0009] Based on the initial estimated location and the actual sensor data, the actual derived data is obtained;
[0010] The initial estimated position, actual sensor data, and actual derived data are input into a deep learning model. The deep learning model uses an LSTM network with a multi-head attention mechanism to process the initial estimated position, actual sensor data, and actual derived data into deep temporal features. Based on the deep temporal features, the final estimated position of the pipeline robot is obtained, wherein the final estimated position is used to locate the pipeline robot.
[0011] According to the first aspect, the deep learning model includes an LSTM network, which includes LSTM layers, normalization layers, and multi-head attention layers;
[0012] The deep learning model uses an LSTM network with a multi-head attention mechanism to process the initial estimated location, actual sensor data, and actual derived data into deep temporal features, including:
[0013] The initial estimated location, actual sensor data, actual derived data, and corresponding time steps are used as initial input features and input into the LSTM layer. The LSTM layer outputs a hidden state sequence based on the initial input features.
[0014] The hidden state sequence is input into the normalization layer, which normalizes the time step in all components of the hidden state sequence to obtain a normalized sequence.
[0015] The normalized sequence is input into the multi-head attention layer. Based on the normalized sequence and a preset model, the multi-head attention layer calculates the self-attention corresponding to each head in the multi-head attention layer, and obtains the deep temporal features through all the self-attentions.
[0016] According to the first aspect, the preset model is:
[0017]
[0018] Let i represent the query matrix for the i-th head, where i = 1, 2, ..., h, and h represents the number of heads.
[0019] This represents the key matrix of the i-th head;
[0020] This represents the value matrix of the i-th head;
[0021] These are respectively represented as the projection matrix of the query matrix, the projection matrix of the key matrix, and the projection matrix of the value matrix learned by the i-th head;
[0022] It is the dimension of the normalized sequence at each head. This represents the dimension of the hidden state sequence.
[0023] According to the first aspect, the deep learning model further includes a first branch and a second branch;
[0024] The process of obtaining the final estimated position of the pipeline robot based on the depth temporal features includes:
[0025] The depth time-series features are input into the first branch and the second branch respectively. The X coordinate of the pipeline robot is output through the first branch and the Y coordinate of the pipeline robot is output through the second branch. The combination of the X coordinate and the Y coordinate is the final estimated position. The X coordinate and the Y coordinate are both located in the same specified coordinate system.
[0026] According to the first aspect, the second branch includes a first path, a second path, and a fusion path, wherein the fusion path includes a fully connected layer, a residual connected layer, a Dropout layer, and an output layer;
[0027] The step of outputting the Y-coordinate of the pipeline robot through the second branch includes:
[0028] The deep temporal features are input into a first path and a second path, respectively. The first path maps the deep temporal features to a high-dimensional space to obtain high-dimensional temporal features, such that the dimension of the high-dimensional temporal features is greater than the dimension of the deep temporal features. Therefore, the second path performs convolution processing on the deep temporal features and compresses the spatiotemporal information of the deep temporal features after convolution processing to obtain local temporal features. The dimension of the local temporal features is the same as the dimension of the deep temporal features.
[0029] The high-dimensional temporal features and local temporal features are concatenated to obtain the concatenated temporal features;
[0030] The spliced temporal features are input into a fully connected layer, which compresses the spliced temporal features to obtain compressed temporal features with the same dimension as the deep temporal features.
[0031] Both the compressed temporal features and the local temporal features are input into the residual connection layer. The residual connection layer adds the compressed temporal features and the local temporal features element by element to obtain the residual connection temporal features.
[0032] The residual connection temporal features are input into the Dropout layer, and the Dropout layer regularizes the residual connection temporal features to obtain regularized temporal features.
[0033] The regularized temporal features are input into the output layer, and the output layer outputs the Y coordinate based on the regularized temporal features.
[0034] According to the first aspect, inputting the actual sensor data into a filtering model and obtaining the initial estimated position of the pipeline robot through the filtering model includes:
[0035] A1: Based on the actual sensor data, the degree of determination at time j is obtained using the filtering model. , Where k represents the current time and j represents the sliding window. Within the time step, Indicates the length of the sliding window;
[0036] A2: The degree of certainty and the pre-stored initial measurement noise covariance matrix Input is given to a pre-established cloud theory model, and the cloud theory model is used to obtain... Measurement noise covariance matrix at time ;
[0037] A3: The filtering model is based on prior states. , The actual sensor data at that time and the measurement noise covariance matrix Determine auxiliary variables Expectations;
[0038] A4: Based on the measured noise covariance matrix Expectations and auxiliary variables The expected value is used to determine the variance of the equivalent measurement noise. ;
[0039] A5: Based on equivalent measurement noise variance For the prior state and prior covariance Perform a Kalman update to obtain Posterior state at time step and posterior covariance ;
[0040] A6: The posterior state As a priori state Posterior covariance As a priori covariance Execute A3-A5 a specified number of times;
[0041] A7: Posterior state obtained based on A6 and actual sensor data The initial estimated position is obtained.
[0042] According to the first aspect, prior to acquiring the actual sensor data during the operation of the pipeline robot, the method further includes:
[0043] Acquire multiple historical sensor data and the historical real location corresponding to each of the historical sensor data;
[0044] Each piece of historical sensor data is input into a filtering model, and the historical estimated location corresponding to each piece of historical sensor data is obtained through the filtering model.
[0045] Based on the historical estimated locations and historical sensor data, historical derived data is obtained;
[0046] The historical sensor data, along with the corresponding historical estimated locations, historical derived data, and historical true locations, are integrated into training data.
[0047] A deep learning model is trained and initialized based on the training data, so that after the initial estimated position, actual sensor data, and actual derived data are input into the deep learning model, the deep learning model outputs the final estimated position.
[0048] Secondly, embodiments of this application also provide an electronic device, the electronic device including a processor and a memory coupled to each other, the memory storing a computer program, and when the computer program is executed by the processor, causing the electronic device to perform the method as described in the first aspect.
[0049] Thirdly, embodiments of this application also provide a computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, which, when run on a computer, causes the computer to perform the method described in the first aspect.
[0050] Fourthly, embodiments of this application also provide a program product, characterized in that it includes a computer program, which, when executed by a processor, implements the method described in the first aspect.
[0051] The invention employing the above technical solution has the following advantages:
[0052] The technical solution provided in this application combines the smoothness of the integrated filtering model with the nonlinear learning capability of the deep learning model. By using the position prediction results of the Kalman filter as network input, the network can learn the smoothness of the filtering method, suppress prediction jump problems, and has a good ability to handle nonlinear features. Furthermore, a multi-head attention mechanism is introduced based on the Long Short-Term Memory network, enhancing the network's ability to capture global dependencies and its generalization performance. This invention improves the positioning accuracy of pipeline robots. Attached Figure Description
[0053] This application can be further illustrated by the non-limiting embodiments given in the accompanying drawings. It should be understood that the following drawings only illustrate some embodiments of this application and should not be considered as limiting the scope. For those skilled in the art, other related drawings can be obtained from these drawings without any inventive effort.
[0054] Figure 1 A flowchart of the pipeline robot positioning method provided in the embodiments of this application.
[0055] Figure 2 The LSTM neuron architecture provided in the embodiments of this application.
[0056] Figure 3 The structure of the multi-head attention layer provided in the embodiments of this application.
[0057] Figure 4 This is a schematic diagram of the self-attention mechanism. Detailed Implementation
[0058] The present application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that similar or identical parts are referred to by the same reference numerals in the drawings or description. Implementations not shown or described in the drawings are forms known to those skilled in the art. In the description of this application, terms such as "first" and "second" are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0059] Please refer to Figure 1 This application provides a pipeline robot positioning method, which can be applied to electronic devices and whose steps can be executed or implemented by the electronic devices. The electronic devices can be, but are not limited to, personal computers, smartphones, and other electronic devices. The pipeline robot positioning method may include the following steps:
[0060] S110, acquire actual sensor data during the operation of the pipeline robot;
[0061] S120, The actual sensor data is input into the filtering model, and the initial estimated position of the pipeline robot is obtained through the filtering model;
[0062] S130, Based on the initial estimated position and the actual sensor data, obtain the actual derived data;
[0063] S140, the initial estimated position, actual sensor data, and actual derived data are input into a deep learning model. The deep learning model uses an LSTM network with a multi-head attention mechanism to process the initial estimated position, actual sensor data, and actual derived data into deep temporal features, and obtains the final estimated position of the pipeline robot based on the deep temporal features. The final estimated position is used to locate the pipeline robot.
[0064] The steps of the pipeline robot positioning method will be explained in detail below:
[0065] In S110, an inertial measurement unit mounted on the pipeline robot and a wheel encoder can be used to synchronously collect sensor data of the robot during its movement. The sensor data in this embodiment includes: wheel encoder data (linear velocity in the x direction and displacement increment in the x direction) and inertial measurement unit data (three-axis acceleration and three-axis angular velocity, where the three axes refer to the X, Y, and Z axes in the geodetic coordinate system. In this embodiment, x and y are based on the geodetic coordinate system).
[0066] In S120, this embodiment processes the sensor data using an improved robust extended Kalman filter to obtain the initial position estimation result. In addressing the issue that sensor data in urban underground drainage network scenarios is susceptible to outlier contamination, the measurement noise distribution is reconstructed into a Student t-distribution, and an auxiliary random variable is introduced to represent the Student t-distribution in Gaussian hierarchical form. Variational inference is used to derive the filtering formula, and cloud model theory is introduced to calculate the determinism of outliers at each time step and dynamically adjust the measurement noise covariance matrix.
[0067] This embodiment addresses the issue of outlier contamination in sensor data within urban underground drainage network scenarios. It employs an improved robust extended Kalman filter for initial position estimation of the sensor data. This filtering method reconstructs the measurement noise distribution into a Student t-distribution to handle non-Gaussian noise. At time... The noise distribution model is as follows:
[0068]
[0069] in and Representing state variables and measured quantities respectively. Represents the observation matrix. This represents the degrees of freedom introduced by the auxiliary random variable. For ease of calculation, an auxiliary random variable is introduced, and the student t-distribution is represented in a Gaussian stratified form. This embodiment, based on the SWRKF (sliding window variational outlier-robust KF filter) algorithm, incorporates cloud model theory to calculate the determinism of outliers at each time step and dynamically adjusts the time interval. Used The numerical values are used to obtain the CM-SWREKF algorithm. Among them, Indicates a sliding window. Indicates the current moment. This indicates the length of the sliding window.
[0070] Based on this, S120 may specifically include the following steps:
[0071] S121: Determine the degree of certainty at time k based on the actual sensor data using the filtering model. ;
[0072]
[0073] in, It is the expected value of the actual sensor data. It is the entropy of sensor data. It is the actual sensor data at time k and These represent the linear velocity in the x-direction, the angle around the z-axis, and the angular velocity, respectively.
[0074] S122: Determinism and the pre-stored initial measurement noise covariance matrix Input is given to a pre-established cloud theory model, and the cloud theory model is used to obtain... Measurement noise covariance matrix at time The measurement noise covariance matrix Used to characterize the actual sensor data The degree of anomaly, where the cloud theory model is:
[0075]
[0076] in, It is the sigmoid function. > 0 is the transition parameter (also known as the "temperature parameter") that controls the steepness of the transition; a > 0 represents the amplification factor. This represents the maximum confidence level of the data used to identify outliers within a specific time period. It is a constant that can be specified independently; This represents the initial measurement noise covariance matrix.
[0077] S123: The filtering model is based on the prior state. , The actual sensor data at the time step and the measurement noise covariance matrix Determine auxiliary variables The expectation.
[0078] S124, based on the measurement noise covariance matrix Expectations and auxiliary variables The expected value is used to determine the variance of the equivalent measurement noise. .
[0079] Among them, the equivalent measurement noise variance Represented as:
[0080]
[0081] In this embodiment, considering the characteristics of the conjugate prior distribution, the inverse Wissaud (IW) distribution is used as the conjugate prior of the normal distribution, thereby simplifying the subsequent VB inference process. The specific expression is as follows:
[0082]
[0083] in They represent The prior degrees of freedom matrix and scaling matrix are given. A forgetting factor is introduced because data from a specific time period was used. The changes are defined as follows:
[0084]
[0085] in and They are ( express The posterior degrees of freedom matrix and scale matrix of the measurement noise covariance matrix at time step.
[0086] Set the initial value to:
[0087]
[0088] To calculate the posterior probability distribution, the VB method is used to jointly estimate the state variables, process noise covariance matrix, measurement noise covariance matrix, and auxiliary random variables within a sliding window [k-L+1, k]. This embodiment defines... Since the estimated variables in the VB method are treated as pairwise independent, the joint posterior distribution can be approximated as:
[0089]
[0090] Combining this with the KL equation, we get:
[0091]
[0092] in, It is a set One of the elements, and Let x represent the expected value of variable x. Therefore, based on the student distribution, the joint distribution in the above equation can be expressed as:
[0093]
[0094] Each variable is calculated using a fixed-point iteration method. Each component is denoted as The following results can be obtained:
[0095]
[0096] The update method for each parameter is as follows:
[0097]
[0098] Based on the above three formulas, the expected value can be obtained:
[0099]
[0100] in Let J and V represent the prior value at time j in the (i+1)th iteration and their corresponding error covariance matrix, respectively. These are obtained by adding reverse smoothing after the filtering method steps. This calculation method is disclosed in the prior art and will not be described in detail in this embodiment.
[0101] These are respectively represented as: the degrees of freedom parameter of the auxiliary variable distribution (Gamma distribution) and the dimension of the measurement vector.
[0102] It is represented as an abbreviation for the Gamma distribution.
[0103] S125: Based on equivalent measurement noise variance For the prior state and prior covariance Perform a Kalman update to obtain Posterior state at time step and posterior covariance .
[0104] Based on the above variational Bayes inference process, the Kalman update process is as follows:
[0105]
[0106] i represents the number of iterations.
[0107] Representation: Identity matrix
[0108] Indicates: time j elapses after time i ( The prior state after (number of iterations) iterations;
[0109] Represents: the prior state covariance matrix after i iterations at time j;
[0110] Represents: the Kalman gain at time j after i+1 fixed-point iterations;
[0111] S126: The posterior state As a priori state Posterior covariance As a priori covariance Execute the specified number of times described in S123-S125;
[0112] S127: Posterior state obtained based on S126 and actual sensor data The initial estimated position is obtained.
[0113] In this embodiment, the cloud theory model is used to dynamically assess the degree of anomaly of the current measurement. Specifically, it is used to construct a cloud model describing the qualitative concept of "normal measurement noise," characterized by three parameters: expectation, entropy, and hyperentropy. The degree of certainty reflects whether the current measurement is abnormal: the closer to 1, the more normal it is; the closer to 0, the more likely it is to be an anomaly. Subsequently, the algorithm dynamically adjusts the measurement noise covariance matrix based on this degree of certainty: when the degree of certainty is low (i.e., the probability of measurement anomaly is high), the measurement noise covariance is increased, thereby automatically reducing the weight of that measurement in subsequent variational Bayesian updates.
[0114] After a specified number of iterations, the posterior state is used. and actual sensor data The final estimated position is obtained by combining the observation matrix with the given information. This calculation method is disclosed in the prior art and will not be elaborated upon in this embodiment.
[0115] In this embodiment, it can be understood that both the initial estimated position and the final estimated position can be represented by x-coordinate values and y-coordinate values in the geodetic coordinate system.
[0116] In S130, the actual derived data includes cumulative displacement in the x-direction, jerk in the x-direction, equivalent rotational energy around the X-axis, composite motion amplitude, and hysteresis characteristics. These are all obtained through mathematical calculations based on the sensor's raw data and filtered estimation results. Specifically, the cumulative displacement in the x-direction is obtained by integrating the encoder velocity or using the filtered coordinate difference, reflecting the total travel; the jerk in the x-direction is obtained by the second-order difference of the velocity, sensitively capturing sudden motion changes such as slippage; the equivalent rotational energy around the X-axis is calculated by the square of the IMU angular velocity, characterizing the roll or torsional intensity; the composite motion amplitude is calculated by the magnitude of the velocity or acceleration vector, describing the overall motion intensity; and the hysteresis characteristics are taken from the feature values of several past moments, explicitly introducing time delay information.
[0117] In this embodiment, inputting actual derived data into the deep learning model is to compensate for the lack of information in describing motion states by the original sensor data and filtering results, enabling a more comprehensive perception of the robot's dynamic characteristics and abnormal events. While original sensor data (such as wheel encoder linear velocity, IMU acceleration, and angular velocity) provides basic physical quantities, it is susceptible to noise, slippage, or drift, and struggles to directly reflect deeper features such as cumulative errors, sudden motion changes, or attitude shifts. By introducing cumulative displacement in the x-direction, jerk, equivalent rotational energy around the x-axis, composite motion amplitude, and hysteresis features, the model can obtain additional information about total travel, sensitivity to sudden motion changes, tilt intensity, overall motion state, and historical dependencies. These derived features enrich the input representation from different dimensions, helping the network more accurately capture the complex mapping between the real position and sensor observations during the learning process.
[0118] This embodiment, based on the SWRKF (sliding window variational outlier-robust KF filter) algorithm, introduces cloud model theory to calculate the determinism of whether the data at each time step is an outlier, and dynamically adjusts R. k The numerical values are used to obtain the CM-SWREKF algorithm. This embodiment constructs a deep learning model. The input to the deep learning model includes sensor data, position data obtained by filtering estimation, and derived data (including cumulative displacement in the x-direction, jerk in the x-direction, equivalent rotational energy around the x-axis, composite motion amplitude, and hysteresis features).
[0119] The deep learning model mentioned in S140 can be trained in the following way:
[0120] Acquire multiple historical sensor data and the corresponding historical true location for each historical sensor data; input each historical sensor data into a filtering model to obtain the historical estimated location for each historical sensor data; obtain historical derived data based on the historical estimated location and historical sensor data; integrate the historical sensor data, the corresponding historical estimated location, historical derived data, and historical true location into training data; train an initialized deep learning model based on the training data, so that after the initial estimated location, actual sensor data, and actual derived data are input into the deep learning model, the deep learning model outputs the final estimated location.
[0121] The loss function during training can be expressed as:
[0122]
[0123] This represents the total loss value;
[0124] Represents the predicted value of a deep learning model Corresponding historical real location The mean square error;
[0125] Understandably, the actual historical location is also represented by the x and y coordinates in the geodetic coordinate system.
[0126] In this embodiment, sensor data and their corresponding real locations at multiple historical moments are first collected and input into a filtering model (CM-SWREKF) to obtain the historical estimated position for each moment. Then, based on the historical estimated position and historical sensor data, historical derived data (including cumulative displacement, jerk, equivalent rotational energy, composite motion amplitude, and hysteresis features) are calculated through mathematical transformations. The historical sensor data, historical estimated position, and historical derived data are used together as input features, with the historical real position as the supervision signal, to construct a training dataset. After initializing the deep learning model, a joint loss function including real position error and filtering smoothness error is used for training, enabling the model to learn the smoothing characteristics of the filtering method while fitting the real trajectory. After training is complete, in the testing phase, only actual sensor data needs to be input, and the model can directly output the final estimated position.
[0127] In this embodiment, the deep learning model includes an LSTM network, which comprises LSTM layers, normalization layers, and multi-head attention layers. LSTM is a special recurrent neural network architecture designed to solve the gradient vanishing problem faced by traditional RNNs when processing long sequences. Its core innovation lies in achieving precise control of information flow through a gating mechanism. The LSTM neuron architecture is as follows: Figure 2 As shown. The gating mechanism of the LSTM layer is as follows:
[0128]
[0129] in, This represents the bias vector.
[0130] Represents the weight matrix. This represents element-wise multiplication. Layer normalization standardizes individual samples along the feature dimension to ensure the stability of inputs to subsequent modules.
[0131] The multi-head attention layer in this embodiment is a deep learning architecture built on a self-attention mechanism, consisting of multiple encoders and decoders. Each layer employs a multi-head self-attention mechanism and a feedforward network, as shown in the following structure. Figure 3 As shown. This invention mainly employs the multi-head attention mechanism to capture the global dependencies between different time steps in the input sequence. Self-attention mechanism (such as...) Figure 4 As shown, the similarity weights between the current position and other positions are calculated through a linear mapping of query (Q), key (K), and value (V), and the final output is formed by weighted summation. The calculation formula for the multi-head attention mechanism is:
[0132]
[0133] Where Q is a matrix of query packages, and key-value pairs are also packaged into K and V. This refers to the dimension of the input data. The multi-head attention mechanism linearly maps the query, key, and value to dimensions dk, dk, and dv through different learned linear projections, for a total of h times. Then, the attention function is executed in parallel on these projected query, key, and value versions, ultimately yielding the output value in the dv dimension. This can be represented as:
[0134]
[0135] Where the projection matrix is a parameter matrix.
[0136] This embodiment does not use the decoder portion of the complete Transformer; instead, it employs only the multi-head attention mechanism from the encoder to adapt to the sequence-to-single-point regression task.
[0137] Therefore, the process of obtaining deep temporal features based on LSTM networks can be described as follows:
[0138] The initial estimated location, actual sensor data, actual derived data, and corresponding time steps are used as initial input features and input to the LSTM layer. The LSTM layer outputs a hidden state sequence based on the initial input features. The hidden state sequence is then input to the normalization layer, which normalizes all components of the hidden state sequence based on the time steps to obtain a normalized sequence. The normalized sequence is then input to the multi-head attention layer, which calculates the self-attention corresponding to each head in the multi-head attention layer based on the normalized sequence and a preset model. The deep temporal features are obtained through all the self-attentions.
[0139] The above process can prevent jumps because the position data of the filter output is added to the input data. When the network learns, it inherits the smoothing characteristics of the filter.
[0140] In this embodiment, the initial estimated location, actual sensor data, actual derived data, and corresponding time steps are used as initial input features. After being input into the LSTM layer, the LSTM unit selectively remembers and forgets information at each time step in the sequence through the synergistic effect of the forget gate, input gate, and output gate. It gradually updates the memory units and hidden states in the time dimension, and finally outputs a hidden state sequence containing all time steps. This sequence transforms the original input into a deep temporal feature with long-term dependencies.
[0141] The hidden state sequence output from the LSTM layer is then fed into a normalization layer. This layer operates independently at each time step, calculating the mean and variance of all components within the feature vector at the current time step. These two statistics are then used to standardize each component of the vector, stabilizing their distribution. Finally, an affine transformation is performed using learnable scaling and translation parameters. This process maintains a uniform scale across the feature dimensions, providing a stable input distribution for subsequent attention layers.
[0142] After the normalized sequence is input into a multi-head attention layer, the layer first maps the input into multiple sets of queries, keys, and values through multiple sets of learnable linear projections, with each set corresponding to an attention head. Within each head, the self-attention weights are obtained by calculating the dot product of the query and key, followed by scaling and softmax processing. The values are then weighted and summed using these weights to generate the output of that head. Subsequently, the outputs of all heads are concatenated and fused through an output projection matrix to finally obtain the deep temporal features. This feature allows each time step to incorporate global contextual information from all time steps in the sequence, compensating for the limitations of local modeling in LSTM.
[0143] The preset model in this embodiment is:
[0144]
[0145] This represents the query matrix for the i-th head. Indicates the number of heads;
[0146] This represents the key matrix of the i-th head;
[0147] This represents the value matrix of the i-th head;
[0148] These are respectively represented as the projection matrix of the query matrix, the projection matrix of the key matrix, and the projection matrix of the value matrix learned by the i-th head;
[0149] It is the dimension of the normalized sequence at each head. This represents the dimension of the hidden state sequence.
[0150] In this embodiment, the deep learning model also includes a first branch and a second branch, so that the deep temporal features output from the LSTM network pass through the first branch and the second branch. The first branch adopts a relatively simple pure attention path design to take into account the characteristics of the wheel encoder that provides X-direction linear velocity information: first, the three-dimensional output is flattened into a one-dimensional feature vector through the Flatten layer, then it passes through a fully connected layer containing 128 neurons, using the ReLU activation function and applying L1 / L2 regularization, then through the Dropout (0.25) layer to randomly discard some neurons to prevent overfitting, and finally the X coordinate prediction value is generated by the output layer of one neuron. The second branch, lacking direct observation information, employs a dual-path fusion design: the first path condenses the shared layer output into a 64-dimensional global representation via an LSTM layer, and then maps it to a high-dimensional space via Dense(128); the second path extracts local features from the attention output via Conv1D, compresses spatiotemporal information via GlobalAveragePooling1D, and then refines the core features via Dense(64); the two features are fused at the Concatenate layer, and then successively integrated via Dense(64), mitigated gradient vanishing via residual connections (Add), and prevented overfitting via Dropout(0.3), finally outputting the Y-coordinate prediction value via Dense(1). The entire dual-branch structure achieves differentiated modeling of observation characteristics in different directions.
[0151] Therefore, the second branch in this embodiment includes a first path, a second path, and a fusion path. The fusion path includes a fully connected layer, a residual connected layer, a Dropout layer, and an output layer. In this embodiment, the Y coordinate is obtained based on the following method:
[0152] The deep temporal features are input into a first path and a second path, respectively. The first path maps the deep temporal features to a high-dimensional space to obtain high-dimensional temporal features, such that the dimension of the high-dimensional temporal features is greater than the dimension of the deep temporal features. Therefore, the second path performs convolution processing on the deep temporal features and compresses the spatiotemporal information of the convolutionally processed deep temporal features to obtain local temporal features, the dimension of which is the same as the dimension of the deep temporal features. The high-dimensional temporal features and the local temporal features are concatenated to obtain concatenated temporal features. The concatenated temporal features are then input into a fully connected layer. The fully connected layer compresses the spliced temporal features to obtain compressed temporal features with the same dimension as the deep temporal features. Both the compressed temporal features and the local temporal features are input into a residual connected layer, which adds the compressed and local temporal features element-wise to obtain residual connected temporal features. These residual connected temporal features are then input into a Dropout layer, which regularizes them to obtain regularized temporal features. Finally, the regularized temporal features are input into an output layer, which outputs the Y-coordinate based on these regularized temporal features.
[0153] In this embodiment, after the deep temporal features are input into the first path, the path is mapped to a high-dimensional space through a fully connected layer to obtain high-dimensional temporal features. The dimension of these features is greater than that of the original deep temporal features, the purpose of which is to enhance the expressive power of the features through dimensionality increase and provide richer semantic information for subsequent fusion.
[0154] After inputting the deep temporal features into the second path, local patterns are first extracted through a one-dimensional convolutional layer to obtain a feature map containing spatial local information. Subsequently, a global average pooling layer is used to compress the variable-length temporal information along the temporal dimension, aggregating it into a fixed-length local temporal feature. This feature has the same dimension as the original deep temporal feature and is used to characterize the overall activation level of the local motion pattern.
[0155] The high-dimensional temporal features output from the first path and the local temporal features output from the second path are concatenated along the feature dimension to obtain concatenated temporal features that integrate global upscaling information and local compression information, providing a more comprehensive feature representation for subsequent integration.
[0156] After the concatenated temporal features are input into the fully connected layer, the layer compresses them to the same dimension as the original deep temporal features through a linear transformation, resulting in compressed temporal features. This step reduces the feature dimension while preserving the fusion information, facilitating subsequent residual connection operations.
[0157] Compressed temporal features and local temporal features are simultaneously input into the residual connection layer, which then adds them element-wise to obtain the residual connection temporal features. This cross-layer connection method allows gradients to propagate directly, effectively alleviating the gradient vanishing problem in deep network training, and enabling the network to learn the residual mapping between input and output, thus improving training stability.
[0158] After inputting the temporal features of the residual connections into the Dropout layer, this layer randomly sets the output of some neurons to zero with a preset probability (e.g., 0.3) during training, while scaling the values of the remaining neurons proportionally. This prevents complex co-adaptation relationships between neurons and reduces the risk of overfitting. Regularization essentially enhances the model's generalization ability by introducing random noise, making the network more robust to unseen data during the testing phase.
[0159] After regularized temporal features are input into the output layer, this layer maps them into a single numerical value through a fully connected linear network, ultimately outputting the predicted value of the Y-coordinate. This output is the model's final estimate of the lateral position at the current time step.
[0160] In this embodiment, the collected training dataset is divided into a training set and a test set in an 8:2 ratio. 80% of the data is used for model training, and the remaining 20% is used for testing. During training, the filtering result is used as one of the supervision signals to enable the network to learn the smoothness of the filtering method. To verify the effectiveness of each module of this invention, ablation experiments and comparative experiments were set up, comparing a total of 12 algorithms: EKF, CM-SWREKF, RNN, RNN+Kalman(RK), LSTM, LSTM+Kalman(LK), Transformer, Transformer+Kalman(TK), LSTM+Transformer(LT), LSTM+Transformer+Kalman(LTK), two-branch LSTM+Transformer(BLT), and two-branch LSTM+Transformer+Kalman(BLTK, i.e., the method of this invention). After training, the model only needs to input sensor data during the testing phase to directly output the predicted X and Y direction position coordinates. The table below summarizes the localization performance indicators of each algorithm, with specific data as follows:
[0161] Table 1 Localization performance indicators of each algorithm
[0162]
[0163] As shown in the table, the BLTK algorithm proposed in this invention performs best in terms of RMSE (0.26m), mean absolute error in the X direction (0.18m), mean absolute error in the Y direction (0.19m), goodness of fit in the X direction (0.98), and goodness of fit in the Y direction (0.96). Compared with the traditional EKF (RMSE 1.48m), BLTK reduces the error by approximately 82.4%; compared with pure neural network models (RNN, LSTM, Transformer, RMSE between 0.65 and 0.79m), BLTK significantly improves accuracy. Ablation experiments show that using Kalman filter data as network input (such as RK, LK, TK, LTK, BLTK) can effectively improve the smoothness and accuracy of predicted trajectories; introducing the Transformer multi-head attention mechanism (comparing LK, TK, and LTK) can enhance the model's ability to model global dependencies; and adopting a dual-branch structure (comparing LTK and BLTK) further optimizes the prediction accuracy in the Y direction.
[0164] This application provides an electronic device that may include a processing module and a memory. The memory stores a computer program, which, when executed by the processor, enables the electronic device to perform the corresponding steps in the pipeline robot positioning method described above.
[0165] In this embodiment, the processor can be an integrated circuit chip with signal processing capabilities. For example, the processor can be a central processing unit (CPU), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components, capable of implementing or executing the methods, steps, and logic block diagrams disclosed in the embodiments of this application.
[0166] The memory can be, but is not limited to, random access memory, read-only memory, programmable read-only memory, erasable programmable read-only memory, electrically erasable programmable read-only memory, etc.
[0167] It should be noted that those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of the electronic device described above can be referred to the corresponding steps in the aforementioned method, and will not be elaborated further here.
[0168] This application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to perform the pipeline robot positioning method as described in the above embodiments.
[0169] Computer-readable storage media may be magnetic disks, optical disks, read-only memory, random access memory, flash memory, USB flash drives, hard disks, or solid-state drives, etc., and may also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code, which, when accessed and executed by the computer, processor, or hardware, implement the methods shown in the above embodiments.
[0170] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the pipeline robot positioning method described above. The computer program product may exist in a computer-readable storage medium in forms including, but not limited to, source files, executable files, and installation package files.
[0171] Based on the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by hardware or by using software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solution of this application can be embodied in the form of a software product. This software product can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, mobile hard drive, etc.) and includes several instructions to cause a computer device (such as a personal computer, electronic device, or network device, etc.) to execute the methods described in the various implementation scenarios of this application.
[0172] In the embodiments provided in this application, it should be understood that the disclosed methods can also be implemented in other ways. The method embodiments described above are merely illustrative. For example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or part of code, which includes one or more executable instructions for implementing a specified logical function. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions. Furthermore, the functional modules in the various embodiments of this application can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.
[0173] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A method for positioning a pipeline robot, characterized in that, The method includes: Acquire actual sensor data during the operation of the pipeline robot; The actual sensor data is input into the filtering model, and the initial estimated position of the pipeline robot is obtained through the filtering model. Based on the initial estimated location and the actual sensor data, the actual derived data is obtained; The initial estimated position, actual sensor data, and actual derived data are input into a deep learning model. The deep learning model uses an LSTM network with a multi-head attention mechanism to process the initial estimated position, actual sensor data, and actual derived data into deep temporal features. Based on the deep temporal features, the final estimated position of the pipeline robot is obtained, wherein the final estimated position is used to locate the pipeline robot. The deep learning model includes an LSTM network, which includes LSTM layers, normalization layers, and multi-head attention layers. The deep learning model uses an LSTM network with a multi-head attention mechanism to process the initial estimated location, actual sensor data, and actual derived data into deep temporal features, including: The initial estimated location, actual sensor data, actual derived data, and corresponding time steps are used as initial input features and input into the LSTM layer. The LSTM layer outputs a hidden state sequence based on the initial input features. The hidden state sequence is input into the normalization layer, which normalizes the time step in all components of the hidden state sequence to obtain a normalized sequence. The normalized sequence is input into the multi-head attention layer. Based on the normalized sequence and a preset model, the multi-head attention layer calculates the self-attention corresponding to each head in the multi-head attention layer, and obtains the deep temporal features through all the self-attentions.
2. The method according to claim 1, characterized in that, The preset model is: ; Let i represent the query matrix for the i-th head, where i = 1, 2, ..., h, and h represents the number of heads. This represents the key matrix of the i-th head; This represents the value matrix of the i-th head; These are respectively represented as the projection matrix of the query matrix, the projection matrix of the key matrix, and the projection matrix of the value matrix learned by the i-th head; It is the dimension of the normalized sequence at each head. This represents the dimension of the hidden state sequence.
3. The method according to claim 1, characterized in that, The deep learning model also includes a first branch and a second branch; The process of obtaining the final estimated position of the pipeline robot based on the depth temporal features includes: The depth time-series features are input into the first branch and the second branch respectively. The X coordinate of the pipeline robot is output through the first branch and the Y coordinate of the pipeline robot is output through the second branch. The combination of the X coordinate and the Y coordinate is the final estimated position. The X coordinate and the Y coordinate are both located in the same specified coordinate system.
4. The method according to claim 3, characterized in that, The second branch includes a first path, a second path, and a fusion path, wherein the fusion path includes a fully connected layer, a residual connected layer, a Dropout layer, and an output layer; The step of outputting the Y-coordinate of the pipeline robot through the second branch includes: The deep temporal features are input into a first path and a second path, respectively. The first path maps the deep temporal features to a high-dimensional space to obtain high-dimensional temporal features, such that the dimension of the high-dimensional temporal features is greater than the dimension of the deep temporal features. Therefore, the second path performs convolution processing on the deep temporal features and compresses the spatiotemporal information of the deep temporal features after convolution processing to obtain local temporal features. The dimension of the local temporal features is the same as the dimension of the deep temporal features. The high-dimensional temporal features and local temporal features are concatenated to obtain the concatenated temporal features; The spliced temporal features are input into a fully connected layer, which compresses the spliced temporal features to obtain compressed temporal features with the same dimension as the deep temporal features. Both the compressed temporal features and the local temporal features are input into the residual connection layer. The residual connection layer adds the compressed temporal features and the local temporal features element by element to obtain the residual connection temporal features. The residual connection temporal features are input into the Dropout layer, and the Dropout layer regularizes the residual connection temporal features to obtain regularized temporal features. The regularized temporal features are input into the output layer, and the output layer outputs the Y coordinate based on the regularized temporal features.
5. The method according to claim 1, characterized in that, The step of inputting the actual sensor data into the filtering model and obtaining the initial estimated position of the pipeline robot through the filtering model includes: A1: Based on the actual sensor data, the degree of determination at time j is obtained using the filtering model. , Where k represents the current time and j represents the sliding window. Within the time step, Indicates the length of the sliding window; A2: The degree of certainty and the pre-stored initial measurement noise covariance matrix Input is given to a pre-established cloud theory model, and the cloud theory model is used to obtain... Measurement noise covariance matrix at time ; A3: The filtering model is based on prior states. , The actual sensor data at that time and the measurement noise covariance matrix Determine auxiliary variables Expectations; A4: Based on the measured noise covariance matrix Expectations and auxiliary variables The expected value is used to determine the variance of the equivalent measurement noise. ; A5: Based on equivalent measurement noise variance For the prior state and prior covariance Perform a Kalman update to obtain Posterior state at time step and posterior covariance ; A6: The posterior state As a priori state Posterior covariance As a priori covariance Execute A3-A5 a specified number of times; A7: Posterior state obtained based on A6 and actual sensor data The initial estimated position is obtained.
6. The method according to claim 1, characterized in that, Before acquiring the actual sensor data during the operation of the pipeline robot, the method further includes: Acquire multiple historical sensor data and the historical real location corresponding to each of the historical sensor data; Each piece of historical sensor data is input into a filtering model, and the historical estimated location corresponding to each piece of historical sensor data is obtained through the filtering model. Based on the historical estimated locations and historical sensor data, historical derived data is obtained; The historical sensor data, along with the corresponding historical estimated locations, historical derived data, and historical true locations, are integrated into training data. A deep learning model is trained and initialized based on the training data, so that after the initial estimated position, actual sensor data, and actual derived data are input into the deep learning model, the deep learning model outputs the final estimated position.
7. An electronic device, characterized in that, The electronic device includes a processor and a memory coupled together, the memory storing a computer program that, when executed by the processor, causes the electronic device to perform the method as described in any one of claims 1 to 6.
8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to perform the method as described in any one of claims 1 to 6.
9. A program product, characterized in that, It includes a computer program that, when executed by a processor, implements the method as described in any one of claims 1 to 6.