A method for predicting the attitude angular velocity of an amphibious vehicle using multi-dimensional cross-fusion perception.

CN121836038BActive Publication Date: 2026-06-30HEFEI UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HEFEI UNIV OF TECH
Filing Date
2026-02-11
Publication Date
2026-06-30

Smart Images

  • Figure CN121836038B_ABST
    Figure CN121836038B_ABST
Patent Text Reader

Abstract

This invention relates to the field of data-driven vehicle state estimation technology, providing a method for predicting the attitude angular velocity of an amphibious vehicle through multi-dimensional cross-fusion perception. The method first collects the historical sequence of the vehicle's three-axis attitude angular velocity, inputting it into a multi-dimensional attitude feature extraction module for preliminary feature fusion and long-term temporal dependency extraction, resulting in 6-dimensional temporal features. Subsequently, this feature is concatenated with the original 3D sequence to form a 9-dimensional input, which is then fed into a multi-dimensional attitude cross-fusion prediction module. This module uses dimensional segmentation embedding, cross-encoding, and cross-decoding, and employs a modified two-stage attention mechanism combining Kolmogorov-Arnold networks to deeply mine the dependencies and interactions between time and feature dimensions, ultimately outputting a 3D attitude angular velocity prediction sequence for a fixed future time step. This invention significantly improves the accuracy and adaptability of amphibious vehicles' attitude prediction in complex cross-domain scenarios by explicitly modeling the complex coupling between the three-axis attitudes.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of data-driven vehicle state estimation technology, and in particular relates to a method for predicting the attitude angular velocity of an amphibious vehicle through multi-dimensional cross-fusion perception. Background Technology

[0002] With the diversification of modern transportation demands, amphibious vehicles have gradually become important tools for handling transportation tasks in complex terrains, and are widely used in fields such as emergency rescue, military deployment, disaster management, and engineering operations. Meanwhile, the deep integration of neural network theory and application technology has provided new methodological and theoretical support for vehicle state estimation, further promoting the development of intelligent transportation. Against this backdrop, purely data-driven vehicle state estimation methods, relying on the powerful feature extraction and nonlinear fitting capabilities of neural networks, not only reduce dependence on explicit physical models but also significantly improve the accuracy and adaptability of estimation, becoming a key technological direction in the field of intelligent vehicles.

[0003] Existing technical solutions mainly rely on attention mechanisms that model in the time dimension. Their model architecture and learning methods are difficult to deeply explore the complex coupling relationships and interaction characteristics between different attitude feature dimensions (such as roll, pitch, and yaw). This results in insufficient utilization of the cross-dimensional joint information contained in multi-dimensional time-series data, limiting their ability to perform high-precision and highly adaptive attitude prediction on amphibious vehicles operating in more complex and variable scenarios. Summary of the Invention

[0004] The purpose of this invention is to provide a method for predicting the attitude angular velocity of amphibious vehicles through multi-dimensional cross-fusion perception, aiming to solve the problems existing in the background technology.

[0005] This invention is implemented as follows: a method for predicting the attitude angular velocity of an amphibious vehicle using multi-dimensional cross-fusion sensing, comprising the following steps:

[0006] Collect the historical time series of the three-axis attitude angular velocity of the amphibious vehicle, including roll angular velocity, pitch angular velocity and yaw angular velocity; input the historical time series of the three-axis attitude angular velocity into the multi-dimensional attitude feature extraction module for feature extraction to obtain 6-dimensional time series features.

[0007] The 6-dimensional temporal features obtained in step one are concatenated with the original three-dimensional triaxial attitude angular velocity sequence to form a 9-dimensional input sequence.

[0008] The 9-dimensional input sequence obtained in step 2 is input into the multi-dimensional attitude cross-fusion prediction module for processing, and the three-dimensional three-axis attitude angular velocity prediction sequence within a fixed time step in the future is output.

[0009] The total loss is calculated based on the feature loss of the multidimensional pose feature extraction module and the prediction loss of the multidimensional pose cross-fusion prediction module. Backpropagation optimization is then performed to update the overall model parameters, thereby achieving end-to-end collaborative learning and model optimization.

[0010] The multidimensional pose feature extraction module includes a preliminary extraction layer, a temporally dilated convolutional layer, and a feature mapping compression layer. The multidimensional pose cross-fusion prediction module includes a dimension segmentation embedding layer, a cross-encoding layer, and a cross-decoding layer. The cross-encoding layer uses a modified two-stage attention mechanism to simultaneously capture the dependency between the temporal dimension and the feature dimension. The two-stage attention mechanism combines a two-stage attention mechanism with a Kolmogorov-Arnold network.

[0011] The present invention provides a method for predicting the attitude angular velocity of an amphibious vehicle through multi-dimensional cross-fusion sensing, which has the following beneficial effects: Attached Figure Description

[0012] Figure 1 This diagram illustrates the implementation scheme of multi-dimensional attitude prediction technology for amphibious vehicles based on cross-fusion perception.

[0013] Figure 2 Here is a flowchart of the multi-dimensional pose feature extraction module.

[0014] Figure 3 This is a structural diagram of a single standard time residual block;

[0015] Figure 4 A framework diagram for fusion prediction;

[0016] Figure 5 A structural diagram of the improved two-stage attention mechanism. Detailed Implementation

[0017] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0018] The specific implementation of the present invention will be described in detail below with reference to specific embodiments.

[0019] like Figure 1 and Figure 5As shown, the technical solution provided by this invention consists of two parts: a multi-dimensional attitude feature extraction module and a multi-dimensional attitude cross-fusion prediction module. The overall process is as follows: First, the multi-dimensional attitude feature extraction module initially mines the cross-relationships and temporal features between the three-axis attitude data; then, the multi-dimensional attitude cross-fusion prediction module receives the 6-dimensional temporal features from the feature extraction module and concatenates them with the original 3D sequence to form a 9-dimensional input sequence. Through the fusion prediction framework and the improved two-stage attention mechanism, the temporal and feature dimension information is deeply mined, and the predicted 3D attitude angular velocity within a fixed time step Tτ is output.

[0020] Specifically:

[0021] S1. Collect the historical time series of the three-axis attitude angular velocity of the amphibious vehicle, including roll angular velocity, pitch angular velocity and yaw angular velocity; input the historical time series of the three-axis attitude angular velocity into the multi-dimensional attitude feature extraction module for feature extraction to obtain 6-dimensional time series features.

[0022] The specific implementation of the multi-dimensional pose feature extraction module is as follows: Figure 2 As shown, the process consists of three progressively layered stages: a preliminary extraction layer, a temporally dilated convolutional layer, and a feature mapping and compression layer. First, the preliminary extraction layer extracts features from the input 3D triaxial attitude angular velocity sequence through convolutional operations, mapping it into a 16-dimensional feature sequence using 16 individual convolutional kernels. This sequence then flows into the temporally dilated extraction layer, where multiple standard temporal residual blocks are concatenated to form a temporally dilated convolutional layer, further expanding the dimension to 32. This enhances the temporally dilated convolutional layer's ability to model long-term temporal dependencies. Simultaneously, causal convolution ensures that the output at each time point depends only on historical information, and dilated convolution exponentially expands the receptive field, achieving collaborative modeling and comprehensive extraction of multi-scale features of the time series. Finally, the feature mapping and compression layer reduces the 32-dimensional features, compressing and mapping them into the required 6-dimensional output, effectively reducing the computational burden on subsequent networks while preserving key information.

[0023] Preliminary feature fusion extraction

[0024] This invention employs a convolutional neural network (CNN) to achieve preliminary fusion extraction of three-axis attitude angular velocities. Leveraging the powerful feature extraction, fitting, and feature learning capabilities of CNNs, it initially uncovers the interrelationships between roll, pitch, and yaw angular velocities. The preliminary fusion extraction of the CNN comprises three stages: a preliminary feature extraction layer, a temporally dilated convolutional layer, and a feature mapping compression layer. The preliminary feature extraction layer uses a standard convolution and a non-linear activation function, mapping the input three-dimensional feature sequence into a 16-dimensional feature representation through 16 convolutional kernels of size 3 × 3. Subsequently, the feature dimension is increased to 32 dimensions through the first temporal residual block in the temporally dilated convolutional layer to enhance the information extraction capability in the temporal dimension. To balance representation capability and computational efficiency, the feature mapping compression layer compresses the 32-dimensional feature sequence and maps it into the final required 6-dimensional output.

[0025] The initial feature extraction function is as follows:

[0026] Preliminary feature extraction layer ( ): ;

[0027] in, It is a standard convolution output. It is a convolution operation (where The kernel size is [size]. For fill size, (coefficient of thermal expansion) It is a tensor composed of the three-dimensional history time series of the three-axis attitude angular velocity of the first layer (where... For batch size, The number of channels is equivalent to the dimension of the input features. (time series length) It is the initial feature extraction layer ( Output, It is a non-linear activation function. Through this layer, preliminary feature extraction is achieved from the original input three-dimensional feature sequence.

[0028] Temporally dilated convolutional layer ( ): ;

[0029] in, It is the output sequence of the first residual block in the four cascaded residual blocks of the time-dilated convolutional layer. This is the function for the first temporal residual block. This function enables preliminary feature extraction from the original input 3D feature data online. The first residual block of this layer increases the feature dimension to 32 dimensions, enhancing the ability to extract information in the temporal dimension.

[0030] Feature mapping compression layer ( ): ;

[0031] in, It is the final output sequence of the multi-feature extraction temporal convolution module. This is the output sequence of the temporally dilated convolutional layer. The feature mapping compression layer compresses the 32-dimensional feature sequence and maps it to the final required 6-dimensional output.

[0032] Temporal Feature Extraction

[0033] This scheme employs a temporally dilated convolutional layer composed of four standard temporal residual blocks cascaded together to extract temporal features. Its core function is to ensure that the output at each time point depends only on historical information through causal convolution, while simultaneously using dilated convolution to exponentially expand the receptive field, thereby efficiently capturing feature dependencies in a large-scale time series. Specifically, the dilated convolution introduces a dilation rate parameter, achieving a significant expansion of the receptive field without increasing the number of network parameters. In this scheme, after receiving the output of the initial feature extraction layer, the temporally dilated convolutional layer sequentially passes the dilation rate... The four cascaded time residuals, numbered 1, 2, 4, and 8 respectively, ultimately form a receptive field covering 61 time steps. This hierarchical structure enables the lower layers to capture short-term feature information and the higher layers to capture long-term feature information. This hierarchical structure allows the lower network to focus on extracting short-term local features, while the higher network captures long-term dependencies, thereby achieving collaborative modeling and comprehensive extraction of multi-scale features of time series.

[0034] The formula for calculating the receptive field is as follows:

[0035] ;

[0036] in, L is the receptive field size, and L is the number of residual block layers. The kernel size used in the main branch of the residual block. For the first The dilation factor is used in each time residual block. By using four layers of time residual blocks, the receptive field size is increased layer by layer, so that a single time step in the final output feature sequence can see information from the past 61 time steps.

[0037] The time residual block calculation function is defined as follows:

[0038] First convolutional sub-block: The second convolutional sub-block: ;

[0039] in, It is the sequence of input residual blocks. This is the output of the main branch calculation. For the first The first convolutional sub-block of a time residual block For the first The first convolutional sub-block of a time residual block For dilated convolution operation, It is the dilation factor used in the current time residual block. It is a causal pruning operation (used to remove excess padding after convolution, ensuring that future information is not seen). This is a regularization operation. Two layers of dilated convolution and causal pruning are used to capture the temporal features of the input time series from both short-term and long-term perspectives.

[0040] Residual branch calculation:

[0041] ;

[0042] Final output:

[0043] ;

[0044] in Indicates the first Each residual block outputs, It is the output of the residual branch. yes Convolution operations are used to adjust dimensions. This indicates that the original information should be preserved when the input and output dimensions are inconsistent. To represent the input and output dimensions being consistent, feature extraction is performed. A residual branch layer is used to prevent deep gradient vanishing, while also addressing the potential distortion of original temporal information caused by multiple convolutions, ensuring the output retains the original signal features. Furthermore, the original and deep features are merged to obtain a multi-scale representation.

[0045] Figure 3 This is a structural diagram of a single standard time residual block.

[0046] like Figure 3As shown, a single standard temporal residual block consists of a main branch and a residual branch connected in parallel. The main branch contains two concatenated convolutional sub-blocks, each of which undergoes dilation convolution to expand the receptive field, causal pruning to strictly guarantee temporal causality, nonlinear activation, and regularization operations in sequence. Data flows into the residual branch simultaneously with the main branch. This branch first achieves dimension alignment and feature mapping through a conditional 1×1 convolution (only enabled when the input and output dimensions are inconsistent). Its core function is to avoid gradient vanishing during deep network training and to preserve the original feature information. Finally, the nonlinear high-order features extracted by the main branch are element-wise added and activated with the identity mapping information of the residual branch. The result will serve as the input to the next residual block or the final output of the temporally dilated convolutional layer.

[0047] S2. The 6-dimensional temporal features obtained in step S1 are concatenated with the original acquired 3D triaxial attitude angular velocity sequence to form a 9-dimensional input sequence. For example... Figure 1 As shown, the multi-dimensional attitude cross-fusion prediction module receives 6-dimensional temporal features from the multi-dimensional attitude feature extraction module and splices them with the original three-dimensional triaxial attitude angular velocity sequence to form a 9-dimensional input sequence.

[0048] S3. Input the 9-dimensional input sequence obtained in step S2 into the multi-dimensional attitude cross-fusion prediction module for processing, and output the three-dimensional three-axis attitude angular velocity prediction sequence within a fixed time step in the future. The specific implementation of this module is as follows:

[0049] like Figure 1 and Figure 4 As shown, the technical solution disclosed in this paper consists of two parts: a multi-dimensional attitude feature extraction module and a multi-dimensional attitude cross-fusion prediction module. First, the multi-dimensional attitude feature extraction module uses its preliminary feature fusion extraction and temporal feature information extraction and mining functions to initially uncover the cross-relationships between the three-axis attitude data and the large-scale temporal information in their respective time dimensions. Then, the multi-dimensional attitude cross-fusion prediction module receives 6-dimensional temporal features from the multi-dimensional attitude features and concatenates them with the original acquired three-dimensional three-axis attitude angular velocity sequence to form a 9-dimensional input sequence. Through the fusion prediction framework and a designed improved two-stage attention mechanism, it deeply mines the dependencies in the time dimension and the interaction information between feature dimensions, gathering multi-scale features to achieve prediction of the three-dimensional attitude angular velocity within a fixed time step in the future.

[0050] Fusion prediction framework

[0051] The fusion prediction framework consists of three regions: a dimensional segmentation embedding layer, a cross-coding layer, and a cross-decoding layer. In this invention, the module receives 6-dimensional temporal features from a multi-dimensional attitude feature extraction module and concatenates them with the original acquired 3D triaxial attitude angular velocity sequence to form a 9-dimensional input sequence. First, this sequence is segmented and embedded. Then, the cross-coding layer extracts the temporal dependencies within each dimension and the interaction features between dimensions. Based on this, the module predicts the 3D attitude angular velocity within a fixed time step of $T_{\tau}$.

[0052] like Figure 4 As shown, the fusion prediction framework consists of three regions: a dimension segmentation embedding layer, a cross-coding layer, and a cross-decoding layer. The input 9-dimensional time series first enters the dimension segmentation embedding layer to achieve segmentation and segmented embedding representation. The processed sequence then enters a three-layer stacked cross-coding layer, utilizing a modified two-stage attention mechanism (…). Simultaneously, it captures the dependencies between the time dimension and different feature dimensions, and achieves short-term and long-term trend capture through segmented merging between layers. Meanwhile, the cross-decoding layer, relying on its hierarchical structure, combines the outputs of each encoding layer layer by layer, gradually aggregating multi-scale features. Finally, by aggregating and filtering the outputs of all decoding layers, it outputs the future time step. The three-dimensional three-axis attitude angular velocity prediction sequence.

[0053] The regional functions of this fusion prediction framework are as follows:

[0054] (a) Dimensional segmentation embedding layer:

[0055] ;

[0056] ;

[0057] in, The output sequence of the dimension segmentation embedding layer. The overall function for the dimension segmentation embedding layer. The input to the dimension segmentation embedding layer is (i.e., the 9-dimensional time series of the input model). It is a time series according to Segmentation It is the length of the segment. for The length of the dimension is The A segment, Indicates linear projection. Indicates position Learnable positional embedding (i.e., adding positional encoding). Data flows into the dimensional segmentation embedding layer, which first divides the data into vector segments of equal length to prepare for temporal self-attention in different time segments in the cross-coding layer, and then achieves dimensional segmentation embedding through linear projection and adding positional encoding.

[0058] (b) Cross-coding layer

[0059] Cross coding

[0060] ;

[0061] ;

[0062] in, The two-dimensional array is obtained by embedding the segmented embedding layer. For the first Layer encoder output, For the first The array after merging the segments, This means concatenating two adjacent time segments and then using a fully connected layer to reduce the time dimension by half. This represents an improved cross-attention mechanism (which sequentially applies cross-attention in the time dimension and cross-attention in the feature dimension to the input multidimensional time series). Through multi-layer cross-encoding, every two adjacent vectors in the time domain are merged to obtain a representation over a larger time range, and then the improved cross-attention mechanism is applied to capture dependencies of this scale.

[0063] (c) Cross-decoding layer

[0064] Cross encoding / decoding:

[0065] ;

[0066] ;

[0067] ;

[0068] in, Embed a zero-initialization sequence at the decoder position. for The output, It is the self-attention of the multi-head (with the first) Layer Output As a query, the first The output of the layer encoder As keys and values, where Representing dimensions (all dimensions of vectors) for Output, For multilayer perceptrons, It is a layer normalization operation. For the first The output of the layer decoder. The cross-decoder structure sequentially acquires feature information from different decoding layers (i.e., different time scales), as well as capturing short-term and long-term trend information.

[0069] ;

[0070] ;

[0071] in, is a learnable matrix used to project vectors onto time series segments. It is the decoder. Layer output, It is to filter the target dimension (three-dimensional three-axis attitude angular velocity). The future time step for the final three-axis attitude angular velocity The time series within the time frame. The cross-decoding layer, based on the decoding outputs of different layers, simultaneously filters the target dimension sequence to achieve the desired time step for future sequences. Three-dimensional three-axis attitude angular velocity prediction.

[0072] Deep cross-integration

[0073] An improved two-stage attention mechanism is designed in this invention. It combines the original two-stage attention mechanism ( The Kolmogorov-Arnold Network (KAN) is used in conjunction with a limited amount of data. This mechanism can efficiently extract dependencies in the time dimension and interaction information between the feature dimension simultaneously, thereby significantly improving the accuracy and generalization ability of multivariate time series prediction models.

[0074] The improved two-stage attention mechanism function is as follows:

[0075] (a) Two-stage attention mechanism across time stages

[0076] Where, in the formula Represents self-attention in the time dimension. Representing dimensions All dimension vectors on, for Output, It is a Kolmogorov-Arnold network (replacing the original multilayer perceptron). for Output. Self-attention and Kolmogorov-Arnold network multi-layer perception are sequentially performed on the same feature dimension to capture information in the time dimension.

[0077] (b) Two-stage attention mechanism across feature dimensions

[0078] ;

[0079] ;

[0080] ;

[0081] ;

[0082] in, Represents self-attention along the feature dimension. Indicates time step All dimensions of the vector, Represented as a learnable vector array, Collect aggregated information from all dimensions for intermediate routes. It is the output of the router mechanism. for Output, for Output. Self-attention across dimensional stages, using a learnable array. intermediate route Information from all dimensions at the same time step is aggregated to reduce computation and improve model efficiency. Subsequently, the Kolmogorov-Arnold network is used for multi-layer perception to capture information across feature dimensions.

[0083] (c) Kolmogorov-Arnold Network (KAN)

[0084] ;

[0085] Dimensional Upgrading ;

[0086] Dimensionality reduction transformation ;

[0087] In the formula, For input data, Indicates the number of hidden layer dimensions. Indicates the number of dimensions in the feedforward layer. It is a learnable nonlinear function (B-spline function). For dimensional transformation, For dimensionality reduction transformation, It is the input data. It is the output of the dimensional transformation. yes Output. The dimension is mapped from the hidden layer dimension to the feedforward layer dimension through a dimensionality-up transformation. Simultaneously, each connection line mapping from the number of input nodes in the hidden layer dimension to the number of output nodes in the feedforward layer dimension has an independent, learnable nonlinear function for fitting. The nonlinear activation structure of the Kolmogorov-Arnold network endows the model with stronger function fitting capabilities, enabling it to construct richer feature representations in high-dimensional space.

[0088] S4. Calculate the total loss based on the feature loss of the multi-dimensional pose feature extraction module and the prediction loss of the multi-dimensional pose cross-fusion prediction module, perform backpropagation optimization, and update the overall model parameters to achieve end-to-end collaborative learning and model optimization. For example... Figure 1 As shown in the corresponding description, after the prediction output is completed, the total loss is calculated based on the feature loss of the multi-dimensional pose feature extraction module and the prediction loss of the fusion prediction module. Backpropagation optimization is then performed and the overall model parameters are updated, thereby achieving end-to-end collaborative learning and model optimization.

[0089] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for predicting the attitude angular velocity of an amphibious vehicle using multi-dimensional cross-fusion sensing, characterized in that, The multi-dimensional cross-fusion perception method for predicting the attitude angular velocity of amphibious vehicles includes: S1: Collect the historical time series of the three-axis attitude angular velocity of the amphibious vehicle, including roll angular velocity, pitch angular velocity and yaw angular velocity; input the historical time series of the three-axis attitude angular velocity into the multi-dimensional attitude feature extraction module for feature extraction to obtain 6-dimensional time series features; S2: The 6-dimensional temporal features obtained in step S1 are concatenated with the original three-dimensional three-axis attitude angular velocity sequence to form a 9-dimensional input sequence; S3: Input the 9-dimensional input sequence obtained in step S2 into the multi-dimensional attitude cross-fusion prediction module for processing, and output the three-dimensional three-axis attitude angular velocity prediction sequence within a future fixed time step Tτ. S4: Calculate the total loss based on the feature loss of the multidimensional pose feature extraction module and the prediction loss of the multidimensional pose cross-fusion prediction module, perform backpropagation optimization and update the overall model parameters to achieve end-to-end collaborative learning and model optimization. The multidimensional pose cross-fusion prediction module includes a dimension segmentation embedding layer, a cross-coding layer, and a cross-decoding layer; The Cross coding layer The cross-decoding layer employs an improved two-stage attention mechanism to simultaneously capture the dependencies between the temporal and feature dimensions. This two-stage attention mechanism combines the two-stage attention mechanism with the Kolmogorov-Arnold network. The function of the dimension segmentation embedding layer is: ; ; in, The output sequence of the dimension segmentation embedding layer. For the overall function of the dimension segmentation embedding layer, For the dimension segmentation embedding layer input, It is a time series according to Segmentation It is the length of the segment. for The length of the dimension is The A segment, Indicates linear projection. Indicates position Learnable position embeddings.

2. The method for predicting the attitude angular velocity of an amphibious vehicle using multi-dimensional cross-fusion sensing as described in claim 1, characterized in that, The multidimensional pose feature extraction module includes a preliminary extraction layer, a temporally dilated convolutional layer, and a feature mapping compression layer. The preliminary extraction layer maps the input three-dimensional feature sequence into a 16-dimensional feature representation using 16 3×3 convolutional kernels. The temporally dilated convolutional layer increases the feature dimension to 32 dimensions by concatenating 4 standard temporal residual blocks. The feature mapping compression layer compresses and maps the 32-dimensional feature sequence into a 6-dimensional output.

3. The method for predicting the attitude angular velocity of an amphibious vehicle using multi-dimensional cross-fusion sensing according to claim 2, characterized in that, The preliminary feature extraction function of the preliminary extraction layer is: ; in, It is a standard convolution output. It's a convolution operation. The kernel size is [size]. For fill size, The coefficient of thermal expansion is 1 / 3. It is a tensor composed of the three-dimensional history time series of the three-axis attitude angular velocity of the first layer. For batch size, The number of channels is equivalent to the dimension of the input features. The length of the time series. This is the output of the initial feature extraction layer. It is a non-linear activation function.

4. The method for predicting the attitude angular velocity of an amphibious vehicle using multi-dimensional cross-fusion sensing according to claim 1, characterized in that, The improved two-stage attention mechanism (TSA_K) includes a cross-time stage and a cross-feature dimension stage; The cross-time stage sequentially performs self-attention and multi-layer perception of the Kolmogorov-Arnold network on the same feature dimension, with the function being: ; ; in, Represents self-attention in the time dimension. Representing dimensions All dimension vectors on, for Output, It is a Kolmogorov-Arnold network (replacing the original multilayer perceptron). for Output; The cross-feature dimension stage aggregates information from all dimensions at the same time step through an intermediate route $B$ containing a learnable array $R$, and then passes it through a multi-layer perception layer of a Kolmogorov-Arnold network. The function is: ; ; ; ; in, Represents self-attention along the feature dimension. Indicates time step All dimensions of the vector, Represented as a learnable vector array, Collect aggregated information from all dimensions for intermediate routes. It is the output of the router mechanism. for Output, for Output.

5. The method for predicting the attitude angular velocity of an amphibious vehicle using multi-dimensional cross-fusion sensing according to claim 4, characterized in that, The function of the Kolmogorov-Arnold network (KAN) is: ; in, For input data, Indicates the number of hidden layer dimensions. Indicates the number of dimensions in the feedforward layer. It is a learnable nonlinear function. For dimensional transformation, This is a dimensionality reduction transformation.

6. The method for predicting the attitude angular velocity of an amphibious vehicle using multi-dimensional cross-fusion sensing according to claim 1, characterized in that, The cross-decoding layer filters the target dimension sequence based on the decoding outputs of different layers, thereby enabling the selection of future time steps with a time step of [missing information]. The three-dimensional three-axis attitude angular velocity prediction function is: , , in, is a learnable matrix used to project vectors onto time series segments. It is the decoder. Layer output, It is to filter the target dimension (three-dimensional three-axis attitude angular velocity). The future time step for the final three-axis attitude angular velocity Time series within.