A method and system for hydrodynamic response prediction of modular underwater robots
By using 3D voxel modeling and a dual-input hybrid neural network, we have achieved six-degree-of-freedom hydrodynamic response prediction for modular underwater robots. This solves the problem of unified expression of complex variable geometry and motion conditions, improves prediction accuracy and efficiency, and is suitable for the rapid design and evaluation of modular underwater robots.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DONGHAI LAB
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-12
Smart Images

Figure CN122199830A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the fields of underwater robot technology and intelligent computing technology, and particularly relates to a method and system for predicting the hydrodynamic response of a modular underwater robot. Background Technology
[0002] The core challenge in addressing the multi-task adaptability requirements of modular underwater robots lies in the fact that different configurations formed by the rapid combination of standardized modules can significantly alter hydrodynamic responses (such as added mass, damping, and coupling torque) due to changes in shape, fluid interference between modules, and the separation characteristics of connecting structures. Traditional engineering modeling methods struggle to efficiently handle the repetitive modeling issues arising from such variable shapes, while existing data-driven solutions also have limitations in geometric representation and operational condition generalization. Existing technological approaches mainly face the following three shortcomings:
[0003] 1) Traditional dynamic modeling approach based on coefficient identification
[0004] This approach follows a process of "first identifying hydrodynamic coefficients, then substituting them into the six-degree-of-freedom equations," heavily relying on the specific shape of the robot. Once the module combination changes to form a new configuration, all hydrodynamic coefficients (including the added mass matrix, damping coefficients, etc.) must be re-identified through experiments or calculations. This process is not only repetitive and time-consuming, but also fails to accumulate cross-configuration knowledge, resulting in each modeling process starting "from scratch," making it difficult to adapt to engineering scenarios where the modular robot configuration changes frequently.
[0005] 2) High-fidelity simulation and experimental route
[0006] While conducting computational fluid dynamics (CFD) simulations or pool tests on each specific configuration can achieve high accuracy, it consumes a lot of computing power, has high testing costs, and a long cycle. This approach cannot support rapid screening of configuration libraries, iterative evaluation of multiple schemes, and is even more difficult to deal with the need for immediate response, such as temporary on-site modifications. In essence, it is still a repetitive and costly verification for fixed shapes.
[0007] 3) Limitations of existing data-driven agent modeling solutions
[0008] Some studies have attempted to use data-driven methods to construct hydrodynamic surrogate models, but these often suffer from two major drawbacks:
[0009] Insufficient geometric representation: Relying only on a few low-dimensional geometric parameters (such as length, width, height, wetted surface area) or two-dimensional projection features as input, it is difficult to accurately describe the complex three-dimensional shape details, the interference effect between modules, and the flow field disturbances caused by the connecting structure.
[0010] Limitations of the modeling approach: It fails to deeply couple geometric information with motion conditions (such as velocity, angular velocity, and attitude) in the modeling, resulting in the model being limited to predictions under quasi-steady or narrow working conditions, with weak generalization ability, poor prediction stability for coupled hydrodynamic components (such as cross-damping moment), and insufficient engineering deployability.
[0011] In summary, there is currently a lack of a hydrodynamic prediction technology that can uniformly express complex and variable geometry and be closely coupled with motion conditions, so as to achieve a balance between accuracy, efficiency and generalization ability, thereby meeting the engineering needs of rapid evaluation and dynamic design of multi-configuration modular underwater robots. Summary of the Invention
[0012] The purpose of this invention is to address the problems existing in the prior art and to provide a method and system for predicting the hydrodynamic response of modular underwater robots. This invention can directly predict the six-degree-of-freedom hydrodynamic spindle (torque) response from the external geometry and motion state, reducing the cost of configuration-by-configuration modeling and improving engineering evaluation efficiency and iteration speed.
[0013] To achieve the above-mentioned objectives, the present invention specifically adopts the following technical solution:
[0014] In a first aspect, the present invention provides a method for predicting the hydrodynamic response of a modular underwater robot, comprising the following steps:
[0015] S1. Perform three-dimensional voxel modeling and occupancy encoding on the three-dimensional mesh geometry of the underwater robot's shape in sequence to obtain the original voxel occupancy tensor composed of occupancy values and perform voxel normalization to obtain the normalized voxel occupancy tensor.
[0016] S2. Motion condition feature vectors are constructed from the motion state information of the underwater robot in the local coordinate system. The standardized motion condition feature vectors and the occupancy tensor of the global element are input into a pre-trained dual-input hybrid neural network. The geometric branch of the network maps the global element occupancy tensor into geometric feature vectors, and the motion branch of the network maps the motion condition feature vectors into motion feature vectors. Then, the regression branch fuses the geometric feature vectors and motion feature vectors. After restoring the inference label generated by the fusion to the physical dimensions, the prediction result of the six-degree-of-freedom hydrodynamic spinor vector is obtained, thus completing the prediction of the hydrodynamic response of the underwater robot.
[0017] Based on the above scheme, each step can be implemented in the following preferred manner.
[0018] As a preferred embodiment of the first aspect mentioned above, the specific process for obtaining the original voxel occupancy tensor in step S1 is as follows:
[0019] S11. Align the 3D mesh geometry to a unified reference coordinate system through rigid body transformation, and calculate the bounding box after alignment;
[0020] S12. Construct a regular voxel grid on the bounding box with a preset voxel spacing to obtain the voxel dimensions;
[0021] S13. For each voxel index in the voxel mesh, if the voxel region corresponding to the voxel index satisfies the preset encoding condition, then the occupancy value corresponding to the voxel index is 1, otherwise it is 0; traverse the voxel mesh to obtain the original voxel occupancy tensor; wherein, the encoding condition is that the intersection of the voxel region and the three-dimensional mesh geometry is non-empty, or that the voxel region is contained within the entity of the underwater robot.
[0022] As a preferred embodiment of the first aspect, the specific process of obtaining the voxel occupancy tensor in step S1 is as follows: first, the original voxel occupancy tensor is resampled in three dimensions. If the size of the resampled result is larger than the preset target size, the part that exceeds the target size boundary is clipped. If the size of the resampled result is smaller than the target size, the insufficient part is filled with zeros.
[0023] As a preferred embodiment of the first aspect, in step S2, the motion condition feature vector is fixed to 12 dimensions to characterize the motion state of the underwater robot, including the linear velocity, angular velocity, linear acceleration, and angular acceleration of the underwater robot; when performing Z-score standardization on the motion condition feature vector, the mean value used is the mean value of the motion condition feature vector in the training dataset of the network during training, and the standard deviation used is the standard deviation of the motion condition feature vector in the training dataset of the network during training.
[0024] As a preferred embodiment of the first aspect, in step S2, each training data used by the dual-input hybrid neural network during training consists of three parts: motion condition feature vector, metric element occupancy tensor, and true label. Furthermore, the true label and motion condition feature vector in the training data must be Z-score standardized.
[0025] As a preferred embodiment of the first aspect mentioned above, during training, the dual-input hybrid neural network uses the mean squared error between the standardized real label and the inferred label output by the network as the basic loss, and modifies the basic loss with respect to the network parameters. The regularization terms are summed to obtain the total loss, and the network parameters are updated based on minimizing the total loss.
[0026] As a preferred embodiment of the first aspect, in step S2, in the dual-input hybrid neural network, the geometric branch maps the occupancy tensor of the metric to a geometric feature vector through a geometric feature extraction network; the motion branch maps the motion condition feature vector to a motion feature vector through a motion feature extraction network; and the regression branch fuses the geometric feature vector and the motion feature vector, and the resulting fused feature vector is used as input. After being processed by a series of cascaded fully connected layers, the fused feature vector is used to obtain the inference label.
[0027] Furthermore, the geometric feature extraction network is a three-dimensional convolutional neural network, a three-dimensional residual neural network, or a 3D UNet encoder.
[0028] Furthermore, the motion feature extraction network consists of multiple fully connected layers, with each fully connected layer followed by a ReLU activation function; alternatively, the motion feature extraction network is a multilayer perceptron or a gated network.
[0029] Furthermore, in the regression branch, the geometric feature vector and the motion feature vector are fused by concatenation, element-wise addition, attention fusion, or gating fusion.
[0030] In a second aspect, the present invention provides a hydrodynamic response prediction system for a modular underwater robot, comprising:
[0031] The voxelization module is used to perform three-dimensional voxelization modeling and occupancy encoding on the three-dimensional mesh geometry of the underwater robot's shape, obtain the original voxel occupancy tensor composed of occupancy values, and perform voxel normalization to obtain the normalized voxel occupancy tensor.
[0032] The prediction module constructs motion condition feature vectors from the underwater robot's motion state information in the local coordinate system. It inputs the standardized motion condition feature vectors and the occupancy tensor of the metric system into a pre-trained dual-input hybrid neural network. The geometric branch of the network maps the occupancy tensor into geometric feature vectors, and the motion branch maps the motion condition feature vectors into motion feature vectors. The regression branch then fuses the geometric and motion feature vectors. After restoring the inference labels generated by the fusion to physical dimensions, the prediction result of the six-degree-of-freedom hydrodynamic spinor vector is obtained, thus completing the prediction of the underwater robot's hydrodynamic response.
[0033] Thirdly, the present invention provides a computer program product, including a computer program / instruction, which, when executed by a processor, enables a hydrodynamic response prediction method for a modular underwater robot as described in any of the solutions of the first aspect above.
[0034] Fourthly, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the hydrodynamic response prediction method for modular underwater robots as described in any of the solutions of the first aspect above.
[0035] Fifthly, the present invention provides a computer electronic device, which includes a memory and a processor;
[0036] The memory is used to store computer programs;
[0037] The processor is configured to, when executing the computer program, implement the hydrodynamic response prediction method for modular underwater robots as described in any of the first aspects above.
[0038] This invention, based on three-dimensional voxelized modeling encoding and a dual-input hybrid neural network, achieves six-degree-of-freedom hydrodynamic response prediction for modular underwater robots. Compared with existing technologies, this invention has at least the following advantages:
[0039] Unified geometric coding: Voxelization is used to transform complex and variable shapes into a unified tensor input, enabling the network to learn the mapping from "geometric pattern to hydrodynamics", thus solving the problem that traditional methods have difficulty in uniformly representing complex shapes.
[0040] Geometric-working-condition coupled modeling: A dual-branch structure is used to extract geometric features and motion features separately and then fuse them for regression, which enhances the ability to express the coupling force / moment components.
[0041] Leakage prevention training and evaluation mechanism: The training and validation sets are divided into groups according to geometry / configuration to avoid inflated indicators caused by leakage of samples with the same geometry, thereby improving the credibility of the project.
[0042] Uniform scaling: Input-output scaling transformation and inverse inference transformation improve training stability and cross-data domain availability, facilitating engineering deployment. Attached Figure Description
[0043] Figure 1 This is a schematic diagram of the overall process of the method of the present invention;
[0044] Figure 2 A schematic diagram of the three-dimensional voxelization encoding and voxel normalization process;
[0045] Figure 3 This is a schematic diagram of a dual-input hybrid neural network structure;
[0046] Figure 4 This is a system block diagram of the present invention;
[0047] Figure 5 This is a schematic diagram of a computer electronic device provided by the present invention. Detailed Implementation
[0048] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Many specific details are set forth in the following description to provide a thorough understanding of the present invention. However, the present invention can be practiced in many other ways different from those described herein, and those skilled in the art can make similar modifications without departing from the spirit of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed below. Technical features in the various embodiments of the present invention can be combined accordingly without mutual conflict.
[0049] In the description of this invention, it should be understood that the terms "first" and "second" are used only for descriptive purposes and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined with "first" and "second" may explicitly or implicitly include at least one of those features.
[0050] This invention provides a method for predicting the hydrodynamic response of a modular underwater robot, which outputs six-degree-of-freedom hydrodynamic spindle (force and torque) prediction results under given underwater robot geometry and motion conditions. Figure 1 As shown, in a preferred embodiment of the present invention, the hydrodynamic response prediction method for modular underwater robots includes the following steps S1 to S2. The specific implementation process of each step will be described in detail below.
[0051] S1. Perform three-dimensional voxel modeling and occupancy encoding on the three-dimensional mesh geometry of the underwater robot's shape in sequence to obtain the original voxel occupancy tensor composed of occupancy values and perform voxel normalization to obtain the normalized voxel occupancy tensor.
[0052] It should be noted that in step S1 of this invention, the aforementioned three-dimensional mesh geometry can be either a triangular mesh or a surface discretization, either method is acceptable. Besides STL meshes, the input format for the aforementioned three-dimensional mesh geometry can also be OBJ / PLY meshes, point clouds, direct voxel generation, SDF / TSDF, occupancy + normal / thickness, and other multi-channel geometric representations; no limitation is imposed in this invention.
[0053] It should be noted that, as Figure 2 As shown, in step S1 of this invention, the original voxel occupancy tensor is obtained. The specific process is as follows:
[0054] S11. Transform the 3D mesh geometry Alignment to a unified reference coordinate system is performed using rigid body transformation; the bounding box is then calculated after alignment. .
[0055] In this embodiment S11, when unifying the coordinate system and scale of the three-dimensional mesh geometry, the geometric center, centroid, or interface reference point can be aligned, and this alignment transformation is denoted as... Furthermore, a certain margin can be added outward according to the external dimensions to avoid boundary truncation.
[0056] S12. On the bounding box, at a preset voxel pitch (resolution). Construct a regular voxel mesh to obtain the voxel dimensions. ;in, These represent the number of voxels in the x, y, and z directions of the voxel grid, i.e., the grid resolution in each dimension.
[0057] S13. For each voxel index in the voxel mesh, if the voxel region corresponding to the voxel index satisfies the preset encoding condition, then the occupancy value corresponding to the voxel index is 1, otherwise it is 0; traverse the voxel mesh to obtain the original voxel occupancy tensor; wherein, the encoding condition is that the intersection of the voxel region and the three-dimensional mesh geometry is non-empty, or that the voxel region is contained within the entity of the underwater robot.
[0058] In this embodiment S13, for voxel indexing The voxel region corresponding to the voxel index is denoted as The occupancy value corresponding to the voxel index is denoted as Specifically, it is expressed as:
[0059]
[0060] in, This represents the empty set. Therefore, by traversing the voxel indices in the voxel grid, a set of size [size missing] can be obtained. The primitive voxel occupancy tensor, in which each element represents an occupancy value of 0 or 1.
[0061] Furthermore, in this embodiment S13, the above-mentioned "intersection / interior" determination condition can be implemented using conventional methods in the art, such as mesh-voxel intersection detection, point-in-polyhedron testing, ray casting, etc. Specific implementation details are not limited in this invention, but the encoding conditions of each voxel index must be consistent and repeatable. The obtained original voxel occupancy tensor can also be expanded into multi-channel voxels, such as occupancy / distance field / surface label / module category, to enhance geometric semantic expression.
[0062] It should be noted that in step S1 of this invention, the overall element occupancy tensor is obtained. The specific process is as follows: First, the original voxel occupancy tensor is resampled in three dimensions. If the size of the resampled result is larger than the preset target size, the part that exceeds the target size boundary is truncated; if the size of the resampled result is smaller than the target size, the insufficient part is filled with zeros.
[0063] In this invention, considering that different configurations or geometric scales may lead to different voxel sizes, three-dimensional resampling to unify the size is required. Since discrete mapping may produce boundary differences of 1-2 voxels, this invention further designs the aforementioned clipping and filling rules for correction. That is, parts exceeding the target size boundary are clipped, and parts below the target size boundary are filled, ultimately ensuring that the dimension of the tensor occupied by the uniform voxel is strictly equal to the target size. The three-dimensional resampling method used here can be nearest neighbor / trilinear, etc.; the filling method can be zero-filling, boundary duplication, or symmetric filling; clipping can be performed around the centroid or bounding box.
[0064] In this embodiment, nearest neighbor sampling is preferably used to keep the occupancy value at 0 or 1, avoiding "grayscale occupancy" that leads to blurred geometric boundaries. In nearest neighbor sampling, the target size needs to be determined first. This size can be taken as the "dataset reference voxel size," for example, using the voxel size of the reference configuration (or the first sample configuration) as a unified target to ensure consistency between subsequent training and inference inputs. The size of the original voxel occupancy tensor is denoted as... The target size is denoted as Therefore, the scaling factors in the x, y, and z directions , , They are represented as follows:
[0065]
[0066]
[0067]
[0068] Then, the scaling factor mentioned above can be applied to the original voxel occupancy tensor to process it, thereby normalizing the voxel size to a uniform target size and fixing the network input dimension.
[0069] S2. Motion condition feature vectors are constructed from the motion state information of the underwater robot in the local coordinate system. The standardized motion condition feature vectors and the occupancy tensor of the global element are input into a pre-trained dual-input hybrid neural network. The geometric branch of the network maps the global element occupancy tensor into geometric feature vectors, and the motion branch of the network maps the motion condition feature vectors into motion feature vectors. Then, the regression branch fuses the geometric feature vectors and motion feature vectors. After restoring the inference label generated by the fusion to the physical dimensions, the prediction result of the six-degree-of-freedom hydrodynamic spinor vector is obtained, thus completing the prediction of the hydrodynamic response of the underwater robot.
[0070] It should be noted that in step S2 of this invention, the motion condition feature vector is fixed at 12 dimensions, used to characterize the motion state of the underwater robot, including the underwater robot's linear velocity, angular velocity, linear acceleration, and angular acceleration, which can be expressed as follows: .in, The three components of linear velocity in the local coordinate system; The three components of angular velocity in the local coordinate system. The three components of linear acceleration in the local coordinate system. These are the three components of angular acceleration in the local coordinate system; This is a transpose. The local coordinate system described above is denoted as... Its axial direction can be set according to engineering practices (e.g.) Bow starboard, (Downward), but consistency must be maintained in geometric feature encoding, motion feature encoding, and label output.
[0071] In this embodiment, the aforementioned motion condition feature vectors can be derived from simulation presets, sensor fusion, or kinematic playback. If acceleration is directly provided (e.g., simulation output or sensor output), it can be used directly; if only linear velocity / angular velocity is provided, the corresponding acceleration can be obtained using discrete difference approximation. (The linear velocity component is used as an example.) For example, its corresponding linear acceleration component It can be calculated using the following formula:
[0072]
[0073] in, for The linear acceleration component at time t; They are respectively , The linear velocity component at time t; The difference step size is determined by the data sampling interval. Other components are calculated in the same way and will not be described in detail in this invention. Furthermore, for engineering usability, the difference results can be smoothed by filtering. The specific type of filtering used here can be selected by those skilled in the art according to actual needs and is not limited in this invention.
[0074] It should be noted that in step S2 of this invention, when performing Z-score standardization on the motion condition feature vector, the mean used is the mean of the motion condition feature vector in the training dataset during network training, and the standard deviation used is the standard deviation of the motion condition feature vector in the training dataset during network training.
[0075] In this embodiment, the above Z-score standardization can be expressed as:
[0076]
[0077] in, This is the standardized motion condition feature vector; and These are the mean and standard deviation of the motion condition feature vectors in the training dataset during network training, respectively.
[0078] It should be noted that in step S2 of the present invention, each training data used in the dual-input hybrid neural network during training consists of three parts: motion condition feature vector, metric element occupancy tensor, and true label. Furthermore, the true label and motion condition feature vector in the training data must be Z-score standardized.
[0079] In this embodiment, to improve model training stability and ensure consistency during inference, in addition to Z-score standardization of the motion condition feature vectors, further optimization of the true labels is performed. Perform a uniform scaling transformation:
[0080]
[0081] in, The standardized, authentic label; and These are the mean and standard deviation of the true labels in the training dataset during network training, respectively. These are the three components of the hydrodynamic force in the local coordinate system, corresponding to longitudinal force, lateral force, and vertical force; These are the roll moment, pitch moment, and yaw moment in the local coordinate system, respectively.
[0082] It is particularly important to note that the above four statistical measures ( and , and The training phase should be determined and fixed in the training phase for use in the reasoning phase, forming a "training-reasoning consistency constraint".
[0083] It should be noted that in step S2 of this invention, during the training of the dual-input hybrid neural network, the mean square error between the standardized real label and the inferred label output by the network is used as the basic loss, and the basic loss is compared with the network parameters. The regularization terms are summed to obtain the total loss, and the network parameters are updated based on minimizing the total loss.
[0084] In this embodiment, the aforementioned basic loss It can be represented as:
[0085]
[0086] in, For network parameters; The amount of training data; The first The inference labels and standardized real labels corresponding to each training data point; It is the square of the L2 norm.
[0087] Furthermore, this embodiment adds an additional layer of loss to the aforementioned basic loss. The regularization term suppresses overfitting. Therefore, the total loss mentioned above... It can be represented as:
[0088]
[0089] in, It is the regularization intensity coefficient.
[0090] It is worth noting that the basic loss mentioned above can be MSE, Huber, or weighted MSE (weighted for key components); the regularization term can be Dropout, L2, early stopping, etc., and the final total loss form can be designed by those skilled in the art according to actual needs.
[0091] It should be noted that the dual-input hybrid neural network in step S2 of this invention aims to establish a dual-input mapping from geometric voxels and motion conditions to hydrodynamic output. The overall mapping in the normalized space is:
[0092]
[0093] in, For parameters A dual-input hybrid neural network; For inference tags.
[0094] The aforementioned dual-input hybrid neural network consists of a geometric branch, a motion branch, and a regression branch. The geometric branch uses a geometric feature extraction network to map the occupancy tensor of the metric into geometric feature vectors; the motion branch uses a motion feature extraction network to map the motion condition feature vectors into motion feature vectors; and the regression branch fuses the geometric and motion feature vectors, using the resulting fused feature vector as input. This fused feature vector is then processed through a series of cascaded fully connected layers to obtain the inference label.
[0095] Furthermore, the geometric feature extraction network is a 3D-Convolutional Neural Network (3D-CNN), or a 3D Residual Neural Network (3D-CNN), or a 3D UNet encoder; the motion feature extraction network consists of multiple fully connected layers, each fully connected layer is followed by a ReLU activation function, or the motion feature extraction network is a multilayer perceptron, or the motion feature extraction network is a gating network.
[0096] Furthermore, in the regression branch, the geometric feature vector and the motion feature vector are fused by concatenation, element-wise addition, attention fusion, or gating fusion.
[0097] In this embodiment, the specific processing flow of the dual-input hybrid neural network is as follows: Figure 3 As shown, the geometric feature extraction network specifically employs a three-dimensional convolutional neural network. First, a three-dimensional convolutional layer with ReLU activation convolves the dimensionality tensor. Then, it is processed sequentially by a first max-pooling layer, a second three-dimensional convolutional layer with ReLU activation, and a second max-pooling layer. The features output from the second max-pooling layer are then flattened and processed by a fully connected layer. Finally, the geometric feature vector is output after passing through a ReLU activation function. The motion feature extraction network consists of two fully connected layers. The motion feature vector is processed sequentially through the first fully connected layer, the ReLU activation function, the second fully connected layer, and the ReLU activation function, outputting the motion feature vector. Next, the geometric feature vector and the motion feature vector are concatenated to obtain the fused feature vector. , For the concatenation operation, the final fused feature vectors are processed through two fully connected layers, with each fully connected layer followed by a ReLU activation function. Random deactivation or other regularization mechanisms can be introduced to improve generalization ability, and the output inference label is then produced. The inverse transformation is then used to restore the physical dimensions, yielding the predicted results of the six-degree-of-freedom hydrodynamic spinor vector. :
[0098]
[0099] In addition, in order to adapt to the scale differences of underwater robot geometry in different directions, anisotropic downsampling (e.g., downsampling in two directions and maintaining higher resolution in a third direction) can be used in the geometric feature extraction network to better preserve key geometric details.
[0100] To better demonstrate the specific implementation and technical effects of the present invention, the hydrodynamic response prediction method for modular underwater robots shown in steps S1 to S2 of the above preferred implementation is applied to a specific example below.
[0101] Example
[0102] The specific implementation process of the hydrodynamic response prediction method for modular underwater robots used in this embodiment is as described above and will not be repeated here.
[0103] This embodiment verifies the prediction of a six-DOF hydrodynamic response for a modular underwater robot. The experimental data consists of 4503 sets of 3D mesh geometry and corresponding motion feature vectors for different module combinations. The 3D mesh geometry is input in STL format and is converted into a voxel occupancy tensor after voxelization. The motion feature vectors include velocity and acceleration components in the local coordinate system.
[0104] In the experiment, the occupancy tensor of the overall system is input into the geometry branch, and the motion condition feature vector is input into the motion branch. After feature fusion, the regression branch outputs a six-degree-of-freedom hydrodynamic spinor vector.
[0105] During model training, a large amount of training data is generated for the same geometric object under different motion conditions. If the training / validation sets are randomly divided according to the training data, the same geometric object may appear in both the training and validation sets simultaneously, resulting in data leakage and artificially inflated generalization performance. Therefore, this method adopts a training and test set partitioning method based on geometric configuration grouping, so that the training data corresponding to the same configuration appears only in either the training or test set, in order to verify the generalization ability of the method of this invention for unseen configurations.
[0106] Specifically, in this embodiment, training data with the same geometric configuration are assigned group identifiers for the same geometric configuration. In scenarios where the modular configuration is enumerable, this identifier is constructed using a module list, connection relationships, and relative poses:
[0107]
[0108] in, For the first Type identifier for each module; Its relative pose parameters For connection topology information; Number of modules; Hash encoding is used. In scenarios with non-modularity or inconsistent mesh sources, stable geometric fingerprints (e.g., hashes of occupancy sets of voxels or hashes of statistical histograms) can be generated from the voxel occupancy tensor to obtain geometric group identifiers. Then, training, validation, and test sets can be partitioned using these geometric group identifiers as units. This ensures that all training data for the same geometric group identifier enters only one set and does not cross sets, thus avoiding geometric information leakage and accurately evaluating generalization ability for unseen geometry. The partition ratio of each set can be set according to engineering needs (e.g., 8:2, 9:1, etc.), but a fixed random seed or partition list must be used in the training records to ensure repeatability.
[0109] To objectively evaluate the performance of this method, this embodiment uses mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and coefficient of determination (CQD). The ) was used as the evaluation index. The experimental results are shown in Table 1.
[0110] Table 1. Experimental results of the method of the present invention in the hydrodynamic prediction task of a modular underwater robot.
[0111] Furthermore, in this embodiment, the prediction accuracy of the six hydrodynamic components was statistically analyzed, and the results are shown in Table 2.
[0112] Table 2. Prediction results of six-degree-of-freedom hydrodynamic components
[0113] As shown in Tables 1 and 2, the method of this invention can achieve high-precision prediction of six-degree-of-freedom hydrodynamic response across different module configurations. The overall coefficient of determination reaches 0.970, indicating a high degree of consistency between the prediction results and computational fluid dynamics simulation results.
[0114] Furthermore, existing experimental data shows that this invention has generated a hydrodynamic dataset containing 43,669 training data points. Each training data point corresponds to a 51×51×21 geometrically uniform occupancy tensor for its 3D mesh. The motion condition feature vector contains at least 12 features, including velocity and acceleration components, and the output is a six-degree-of-freedom force / torque response. Through this unified data representation method, data from different configurations and working conditions can be jointly trained within the same network framework, thereby significantly improving the unified representation capability and cross-configuration reuse capability for complex and variable shapes.
[0115] During training, this invention performs scaling transformations on both the motion condition feature vectors and the ground truth labels to improve training stability and numerical consistency. Combined with the Optuna automatic search mechanism, a total of 80 hyperparameter searches are completed, and early stopping, pruning, and batch adaptive strategies under memory constraints are introduced. Experimental results show that the network training process exhibits good convergence. In representative excellent experiments, the lowest validation loss reaches 0.195185, and the training loss remains stable in the range of approximately 0.052–0.055 after convergence, indicating that this invention possesses good training stability and repeatable optimization capabilities.
[0116] In terms of prediction accuracy, this invention achieved high overall prediction accuracy in a batch evaluation of 43,669 training data points, with a mean absolute error of 0.403321, a root mean square error of 0.755832, and a coefficient of determination of 0.973725. This indicates that the invention has strong fitting and prediction capabilities for the main six-degree-of-freedom hydrodynamic components. In the linear correction analysis, the fitting coefficients of multiple components are close to 1, further demonstrating that the prediction results have good consistency with the target values.
[0117] On the independent test set, the present invention also demonstrates good generalization ability. For 3002 test data points, the mean absolute error was 0.594315, the root mean square error was 1.060950, and the coefficient of determination was 0.946775. This indicates that the present invention can not only achieve high-precision fitting within the training data range, but also maintain high overall prediction accuracy on test data that did not participate in the training, thus possessing good engineering application value.
[0118] Based on the above results, this invention transforms the traditional process of "repeatedly conducting CFD simulations, experiments, and hydrodynamic identification for each new configuration" into a process of "rapid inference and prediction through a unified model," significantly reducing the time, computational, and iteration costs in the multi-configuration design evaluation process. Simultaneously, this invention can directly output six-degree-of-freedom force / torque responses, facilitating modular configuration screening, scheme comparison, design optimization, and subsequent dynamic analysis, demonstrating strong engineering practicality. Its training and evaluation mechanism also reduces the risk of bias introduced by dimensional differences and data leakage through grouping, scale consistency, and batch data pipelines, thereby improving the reliability of the results.
[0119] In summary, the method of the present invention can predict the six-degree-of-freedom hydrodynamic spindle directly based on the geometric configuration and motion state of a modular underwater robot without having to repeatedly perform complete hydrodynamic coefficient identification for each module configuration. This improves the efficiency of configuration evaluation, reduces the cost of repeated simulation and testing, and is suitable for rapid design, configuration selection, and motion simulation analysis of modular underwater robots.
[0120] It should also be noted that the hydrodynamic response prediction method for modular underwater robots in the above embodiments can essentially be executed by a computer program or module. Therefore, similarly, based on the same inventive concept, another preferred embodiment of the present invention also provides a hydrodynamic response prediction system for modular underwater robots, corresponding to the hydrodynamic response prediction method for modular underwater robots provided in the above embodiments, such as... Figure 4 As shown, it includes:
[0121] The voxelization module is used to perform three-dimensional voxelization modeling and occupancy encoding on the three-dimensional mesh geometry of the underwater robot's shape, obtain the original voxel occupancy tensor composed of occupancy values, and perform voxel normalization to obtain the normalized voxel occupancy tensor.
[0122] The prediction module constructs motion condition feature vectors from the underwater robot's motion state information in the local coordinate system. It inputs the standardized motion condition feature vectors and the occupancy tensor of the metric system into a pre-trained dual-input hybrid neural network. The geometric branch of the network maps the occupancy tensor into geometric feature vectors, and the motion branch maps the motion condition feature vectors into motion feature vectors. The regression branch then fuses the geometric and motion feature vectors. After restoring the inference labels generated by the fusion to physical dimensions, the prediction result of the six-degree-of-freedom hydrodynamic spinor vector is obtained, thus completing the prediction of the underwater robot's hydrodynamic response.
[0123] It is understood that the hydrodynamic response prediction method for modular underwater robots described in S1-S2 above can essentially be implemented by a computer program. Therefore, based on the same inventive concept, another preferred embodiment of the present invention also provides a computer program product corresponding to the hydrodynamic response prediction method for modular underwater robots provided in the above embodiments, which includes a computer program / instructions. When executed by a processor, the computer program / instructions can implement the hydrodynamic response prediction method for modular underwater robots as described in the above embodiments.
[0124] Similarly, based on the same inventive concept, another preferred embodiment of the present invention also provides a computer electronic device corresponding to the hydrodynamic response prediction method for modular underwater robots provided in the above embodiments, such as... Figure 5 As shown, it includes a memory and a processor;
[0125] The memory is used to store computer programs;
[0126] The processor is configured to implement the hydrodynamic response prediction method for modular underwater robots in the above embodiments when executing the computer program.
[0127] Furthermore, the logical instructions in the aforementioned memory can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.
[0128] Therefore, based on the same inventive concept, another preferred embodiment of the present invention also provides a computer-readable storage medium corresponding to the hydrodynamic response prediction method for modular underwater robots provided in the above embodiments. The storage medium stores a computer program that, when executed by a processor, can implement the hydrodynamic response prediction method for modular underwater robots in the above embodiments.
[0129] Specifically, in the computer-readable storage medium of the above three embodiments, the stored computer program is executed by a processor, which can perform the aforementioned steps S1 to S2.
[0130] It is understood that the aforementioned storage media may include random access memory (RAM) or non-volatile memory (NVM), such as at least one disk storage device. Furthermore, the storage media may also be various media capable of storing program code, such as USB flash drives, external hard drives, magnetic disks, or optical discs.
[0131] It is understood that the processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0132] It should also be noted that those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of the system described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here. In the embodiments provided in this application, the division of steps or modules in the system and method is merely a logical functional division, and there may be other division methods in actual implementation. For example, multiple modules or steps may be combined or integrated together, and a module or step may also be split.
[0133] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the invention. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, all technical solutions obtained through equivalent substitution or transformation fall within the protection scope of the present invention.
Claims
1. A method for predicting the hydrodynamic response of a modular underwater robot, characterized in that, Includes the following steps: S1. Perform three-dimensional voxel modeling and occupancy encoding on the three-dimensional mesh geometry of the underwater robot's shape in sequence to obtain the original voxel occupancy tensor composed of occupancy values and perform voxel normalization to obtain the normalized voxel occupancy tensor. S2. Motion condition feature vectors are constructed from the motion state information of the underwater robot in the local coordinate system. The standardized motion condition feature vectors and the occupancy tensor of the global element are input into a pre-trained dual-input hybrid neural network. The geometric branch of the network maps the global element occupancy tensor into geometric feature vectors, and the motion branch of the network maps the motion condition feature vectors into motion feature vectors. Then, the regression branch fuses the geometric feature vectors and motion feature vectors. After restoring the inference label generated by the fusion to the physical dimensions, the prediction result of the six-degree-of-freedom hydrodynamic spinor vector is obtained, thus completing the prediction of the hydrodynamic response of the underwater robot.
2. The hydrodynamic response prediction method for modular underwater robots as described in claim 1, characterized in that, In step S1, the specific process of obtaining the original voxel occupancy tensor is as follows: S11. Align the 3D mesh geometry to a unified reference coordinate system through rigid body transformation, and calculate the bounding box after alignment; S12. Construct a regular voxel grid on the bounding box with a preset voxel spacing to obtain the voxel dimensions; S13. For each voxel index in the voxel mesh, if the voxel region corresponding to the voxel index satisfies the preset encoding condition, then the occupancy value corresponding to the voxel index is 1, otherwise it is 0; traverse the voxel mesh to obtain the original voxel occupancy tensor; wherein, the encoding condition is that the intersection of the voxel region and the three-dimensional mesh geometry is non-empty, or that the voxel region is contained within the entity of the underwater robot.
3. The hydrodynamic response prediction method for modular underwater robots as described in claim 1, characterized in that, In step S1, the specific process of obtaining the voxel occupancy tensor is as follows: First, the original voxel occupancy tensor is resampled in three dimensions. If the size of the resampled result is larger than the preset target size, the part that exceeds the target size boundary is truncated; if the size of the resampled result is smaller than the target size, the insufficient part is filled with zeros.
4. The hydrodynamic response prediction method for modular underwater robots as described in claim 1, characterized in that, In step S2, the motion condition feature vector is fixed at 12 dimensions to characterize the motion state of the underwater robot, including the linear velocity, angular velocity, linear acceleration, and angular acceleration of the underwater robot. When performing Z-score standardization on the motion condition feature vector, the mean is the mean of the motion condition feature vector in the training dataset during network training, and the standard deviation is the standard deviation of the motion condition feature vector in the training dataset during network training.
5. The hydrodynamic response prediction method for modular underwater robots as described in claim 1, characterized in that, In step S2, during the training of the dual-input hybrid neural network, each training data consists of three parts: motion condition feature vector, metric element occupancy tensor, and true label. The true label and motion condition feature vector in the training data must be standardized by Z-score. During training, the dual-input hybrid neural network uses the mean squared error between the standardized ground truth label and the inferred label output by the network as the basic loss, and modifies the basic loss with respect to the network parameters. The regularization terms are summed to obtain the total loss, and the network parameters are updated based on minimizing the total loss.
6. The hydrodynamic response prediction method for modular underwater robots as described in claim 1, characterized in that, In step S2, in the dual-input hybrid neural network, the geometric branch maps the occupancy tensor of the metric to a geometric feature vector through a geometric feature extraction network; the motion branch maps the motion condition feature vector to a motion feature vector through a motion feature extraction network; the regression branch fuses the geometric feature vector and the motion feature vector, and the resulting fused feature vector is used as input. After being processed by a series of cascaded fully connected layers, the fused feature vector is used to obtain the inference label. The geometric feature extraction network is a three-dimensional convolutional neural network, a three-dimensional residual neural network, or a 3DUNet encoder; the motion feature extraction network consists of multiple fully connected layers, each followed by a ReLU activation function; or, the motion feature extraction network is a multilayer perceptron or a gating network; in the regression branch, the geometric feature vector and the motion feature vector are fused by concatenation, element-wise addition, attention fusion, or gating fusion.
7. A hydrodynamic response prediction system for modular underwater robots, characterized in that, include: The voxelization module is used to perform three-dimensional voxelization modeling and occupancy encoding on the three-dimensional mesh geometry of the underwater robot's shape, obtain the original voxel occupancy tensor composed of occupancy values, and perform voxel normalization to obtain the normalized voxel occupancy tensor. The prediction module constructs motion condition feature vectors from the underwater robot's motion state information in the local coordinate system. It inputs the standardized motion condition feature vectors and the occupancy tensor of the metric system into a pre-trained dual-input hybrid neural network. The geometric branch of the network maps the occupancy tensor into geometric feature vectors, and the motion branch maps the motion condition feature vectors into motion feature vectors. The regression branch then fuses the geometric and motion feature vectors. After restoring the inference labels generated by the fusion to physical dimensions, the prediction result of the six-degree-of-freedom hydrodynamic spinor vector is obtained, thus completing the prediction of the underwater robot's hydrodynamic response.
8. A computer program product comprising a computer program / instructions, characterized in that, When the computer program / instruction is executed by the processor, it can implement the hydrodynamic response prediction method for modular underwater robots as described in any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, implements the hydrodynamic response prediction method for modular underwater robots as described in any one of claims 1 to 6.
10. A computer electronic device, characterized in that, Including memory and processor; The memory is used to store computer programs; The processor is configured to, when executing the computer program, implement the hydrodynamic response prediction method for modular underwater robots as described in any one of claims 1 to 6.