A biomimetic brain-like synchronous localization and environmental perception method for underwater robots
By employing a biomimetic brain-like simultaneous localization and environmental perception method, combined with acoustic and visual processing, IMU pre-integration, and deep learning networks, the problems of poor visual odometry performance and insufficient robustness in underwater robot navigation are solved, achieving high-precision underwater navigation and map building.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN ENG UNIV
- Filing Date
- 2023-11-30
- Publication Date
- 2026-06-30
AI Technical Summary
Existing underwater robot navigation technologies suffer from poor visual odometry performance, insufficient robustness, and low accuracy in closed-loop detection.
A biomimetic brain-like simultaneous localization and environmental perception method is adopted. By combining acoustic and visual processing, IMU pre-integration, deep learning network and point cloud matching with pose cells and visual templates, an experience map is constructed and loop closure detection is performed to improve navigation accuracy and robustness.
It improves the accuracy and robustness of navigation in complex underwater environments, ensuring that the robot can autonomously build high-precision maps and reduce motion drift.
Smart Images

Figure CN117664106B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to underwater robot navigation, specifically to an underwater robot biomimetic brain-inspired SLAM (UBSLAM) method. This invention belongs to the fields of bionics and motion navigation technology. Background Technology
[0002] High-precision underwater navigation and positioning capabilities have always been a bottleneck restricting underwater robots from achieving seabed topography exploration. Currently, there are three common underwater robot navigation technologies: inertial navigation, underwater acoustic navigation, and geophysical navigation. Each of these technologies has certain drawbacks. Inertial navigation errors accumulate over time; underwater acoustic navigation suffers from difficulties in array maintenance and retrieval, and has a limited detection range; while geophysical navigation requires prior information, and single geophysical prior information is insufficient to meet the high precision requirements of underwater navigation.
[0003] Simultaneous Localization and Mapping (SLAM) technology enables underwater robots to simultaneously construct a map of their surrounding environment and estimate their own pose within that map. Therefore, SLAM technology holds immense potential and scope for development in underwater navigation and positioning, and is of great significance for improving the autonomy of underwater robots.
[0004] The core idea of SLAM is to solve the questions of "Who am I, where am I, and what am I going to do?" This idea is very similar to the behavior in biological brain-inspired navigation. Biological brain-inspired navigation draws on the spatial navigation neural mechanisms of the hippocampus and related brain regions in biological brains, integrates multiple adaptive environmental perception methods such as vision and hearing, and utilizes neuromorphic computing models such as attractor neural networks to meet the autonomous navigation task requirements of intelligent agents in special and complex environments.
[0005] Therefore, existing underwater robot navigation suffers from poor visual odometry performance, insufficient robustness, and low accuracy in closed-loop detection. Summary of the Invention
[0006] To address the problems of poor visual odometry, insufficient robustness, and low accuracy of closed-loop detection in existing underwater robot navigation systems, this invention proposes a biomimetic brain-like synchronous localization and environmental perception method for underwater robots.
[0007] The technical solution adopted by the present invention to solve the above problems is as follows:
[0008] This invention includes the following steps:
[0009] Step 1: Process the sensor information obtained by the acoustic-visual processing method to form a local scene;
[0010] Step 2: Pre-integrate the IMU (Inertial Measurement Unit) data to obtain position, velocity, and rotation increments;
[0011] Step 3: Feed the processed training dataset into the network for learning to obtain the estimated displacement and estimated heading;
[0012] Step 4: Update the position and heading, and input the pose cells for pose expression;
[0013] Step 5: After acoustic and visual processing, the external sensors are used as input to the underwater robot to construct local scene cells;
[0014] Step Six: Obtain the real-time scene at the current moment through local scenes and match it with previously stored scenes to determine whether to reconstruct the environment;
[0015] Step 7: Connect the pose cell and the local scene cell at a certain moment to form the experience point at that moment;
[0016] Step 8: Experience Mapping. Construct and revise the experience map based on cognitive points.
[0017] Furthermore, the audiovisual processing method in step one specifically includes: processing echo intensity information, suppressing measurement noise, and motion compensation.
[0018] Furthermore, the pre-integration in step two can directly calculate the relative pose between two frames, without depending on the initial value.
[0019] Furthermore, the network in step three includes a displacement estimation network and a heading estimation network, wherein the input of the heading estimation network is used as the tenth dimension of the displacement network input, and the robot's motion displacement is estimated using a CNN and an attention network.
[0020] Furthermore, the determination of whether to perform environment reconstruction in step six is based on comparing point cloud similarity.
[0021] Furthermore, generating a visual template from point cloud data involves transforming the point cloud data into a coordinate system and then performing point cloud matching to obtain the visual template.
[0022] Furthermore, the experience point in step seven is a physical region, and each experience point includes the pose cell activity state P and the visual template V.
[0023] Furthermore, the revised experience map in step eight refers to integrating all the information obtained by the robot through the experience map, updating the map through closed-loop detection, thereby completing the construction of the experience map.
[0024] Furthermore, the construction of the experience map specifically involves comparing the pose cell encoding and visual template in the current state with the experience points stored in the experience map. If the difference is large enough, a new experience point is created; otherwise, it is determined to be a closed loop point.
[0025] The judgment method is as follows:
[0026] S = μ p |P i -P|+μ v |V i -V|
[0027] In the formula, μ p and μ v These are the weights of the pose cell encoding and the visual template, respectively, and are chosen empirically. P i V encodes the i-th pose cell stored in the experience map. i The i-th visual template stored in the experience map; when S is less than a given threshold S MAX When the current scene is confirmed to be a scene that the robot has already experienced, i.e., a loop closure has occurred, the nodes in the map are updated according to the following formula:
[0028]
[0029] In the formula, α is the correction rate constant, and N f N represents the number of connections from experience point i to other experience points. t Let i be the number of connections from other experience points to experience point i.
[0030] Furthermore, since the robot is also moving during the scanning of the environment by the motion-compensated sonar, it cannot be assumed that the origin of all beams is at the same position. The real-time pose of the robot relative to each beam must be taken into account.
[0031] The beneficial effects of this invention are:
[0032] This invention provides a biomimetic brain-like synchronous localization and environmental perception method for underwater robots. This invention combines point cloud features with a spatial cognitive model, which improves the application range of the algorithm and can still work normally in the underwater environment, showing good robustness.
[0033] Meanwhile, this invention uses pre-integration and deep learning networks to process sensor information, which improves the robustness of the odometer and makes it more accurate in complex underwater environments. Attached Figure Description
[0034] Figure 1 This is an information flow diagram of the present invention;
[0035] Figure 2This is a comparison chart of the localization results of underwater robots based on UBSLAM (Underwater Brain-like Simultaneous Localization and Environmental Awareness) and thrust navigation.
[0036] Figure 3 This is a comparison chart of underwater positioning errors between UBSLAM and push-position navigation;
[0037] Figure 4 This is a flowchart of the depth motion estimation algorithm. Detailed Implementation
[0038] Specific implementation method one: Combining Figures 1 to 3 This embodiment describes a biomimetic brain-like synchronous localization and environmental perception method for underwater robots, which includes the following steps:
[0039] Step 1: Process the sensor information obtained by the acoustic-visual processing method to form a local scene;
[0040] Step 2: Pre-integrate the IMU (Inertial Measurement Unit) data to obtain position, velocity, and rotation increments;
[0041] Step 3: Feed the processed training dataset into the network for learning to obtain the estimated displacement and estimated heading;
[0042] Step 4: Update the position and heading, and input the pose cells for pose expression;
[0043] Step 5: After acoustic and visual processing, the external sensors are used as input to the underwater robot to construct local scene cells;
[0044] Step Six: Obtain the real-time scene at the current moment through local scenes and match it with previously stored scenes to determine whether to reconstruct the environment;
[0045] Step 7: Connect the pose cell and the local scene cell at a certain moment to form the experience point at that moment;
[0046] Step 8: Experience Mapping. Construct and revise the experience map based on cognitive points.
[0047] Application Scenarios: This invention can be applied to the construction of cognitive maps for robots in underwater environments. The robot acquires environmental features through sonar sensors and obtains its own motion information through navigation sensors, thereby constructing a cognitive map.
[0048] like Figure 1As shown, the robot first acquires environmental feature information through sonar sensors and its own motion information through navigation sensors. It then processes the sonar data using acoustic-visual processing methods to obtain a local scene template. Pre-integration processing is performed on the sensor data, and the processed data is used as input to a convolutional neural network to output motion displacement, thus forming the robot's perception of its own position. Finally, an experience map integrates the above information and updates the experience map through loop closure detection, reducing drift during robot movement and completing the construction of the experience map.
[0049] Specific Implementation Method Two: Combining Figures 1 to 3 This embodiment describes the audiovisual processing method in step one, which specifically includes: processing echo intensity information, suppressing measurement noise, and motion compensation.
[0050] The sensor information obtained by the sonar vision processing method specifically includes: processing echo intensity information, suppressing measurement noise, and motion compensation. Motion compensation refers to the fact that the underwater robot is also moving during the sonar scanning of the environment. It cannot be assumed that the origin of all beams is at the same position; the real-time pose of the underwater robot corresponding to each beam must be taken into account.
[0051] Specific implementation method three: Combining Figures 1 to 3 This embodiment describes a method where the pre-integration in step two can directly calculate the relative pose between two frames, without relying on initial values.
[0052] The pre-integral calculation method is as follows:
[0053]
[0054]
[0055]
[0056] Where ΔR ij This represents the relative rotation between the i-th and j-th time points; This represents the measured value of the rotation of the k-th gyroscope; ΔV represents the zero bias corresponding to the rotation measurement of the k-th gyroscope; Δt represents the time interval between two gyroscope measurements; ΔV ij This represents the relative degree change between time i and time j; This represents the measurement value of the k-th accelerometer; ΔP represents the zero bias corresponding to the k-th accelerometer measurement. ij This represents the relative positional change between the i-th and j-th time points.
[0057] Specific implementation method four: Combination Figures 1 to 3This embodiment describes a network in step three that includes a displacement estimation network and a heading estimation network. The input of the heading estimation network is used as the tenth dimension of the displacement network input. A CNN and an attention network are used to estimate the robot's motion displacement.
[0058] The input-output expressions of the network are:
[0059]
[0060] In the formula, α and ω represent the acceleration and angular velocity vectors, respectively, and Ψ M This indicates the heading provided by the sensor. The output is the displacement ΔL in the world coordinate system and the estimated heading.
[0061] Specific Implementation Method Five: Combining Figures 1 to 3 This embodiment describes how, in step four, displacement and heading are updated, and pose is expressed by inputting pose cells.
[0062] Update displacement and heading:
[0063] X n =X n-1 +ΔL n cos(ψ n-1 )
[0064] Y n =Y n-1 +ΔL n sin(ψ n-1 )
[0065] In the formula, X and Y represent the robot's eastward and northward positions, respectively; ΔL represents the displacement estimated by the convolutional neural network; and ψ represents the heading.
[0066] The pose cells maintain dynamic stability of activation levels through an attractor network. The internal dynamic process is divided into three parts: excitation update, global inhibition, and normalization.
[0067] The excitation weight matrix ε can be represented by a three-dimensional discrete Gaussian matrix. This matrix represents the mapping between each cell and its neighboring cells. abc It is the product of the excitability of the planar region (x′, y′) in the pose cell network and the excitability of the head direction θ′, and is calculated as follows;
[0068]
[0069] Where, k x′y′ k is the variance constant of the planar region (x′, y′). θ′θ′ is the variance constant of the head direction, and a, b, and c are the distribution coefficients of dimensions x′, y′, and θ′, respectively.
[0070] When the most probable pose is detected, the corresponding cell is locally stimulated to cause a change in cell activity. The degree of change ΔP is calculated as follows:
[0071]
[0072] It takes time for multiple active cells to form a sensory cell. During this time, to prevent the weights from being continuously strengthened without limit, a global inhibition variable is introduced. The activity of pose-dependent cells is slowly inhibited. Global inhibition is an iterative process, and the excitability of cells after global inhibition is calculated as follows:
[0073]
[0074] After completing the excitability update and global inhibition, the cell activity needs to be normalized. The normalized cell excitability is calculated as follows:
[0075]
[0076] Specific Implementation Method Six: Combination Figures 1 to 3 This embodiment describes step five: After the external sensor undergoes acoustic and visual processing, it is used as a local scene input to the underwater robot to construct local scene cells;
[0077] The sensor input process is essentially the process of associating local scene cells and pose cells; the connection between the two can be represented as:
[0078]
[0079] Among them, V i It refers to the activity of cells in a localized scene, P x′y′θ′ λ represents the activity of the positioned cells, and λ is a constant.
[0080] Specific implementation method seven: Combining Figures 1 to 3 This embodiment describes a method where step six involves determining whether to perform environmental reconstruction by comparing point cloud similarity.
[0081] The other components and connections in this embodiment are the same as those in specific embodiments one, two, three, four, or five.
[0082] Specific implementation method seven: Combining Figures 1 to 3 This embodiment describes how generating a visual template from point cloud data involves transforming the point cloud data into a coordinate system and then performing point cloud matching to obtain the visual template.
[0083] The other components and connections in this embodiment are the same as those in specific embodiments one, two, three, four, or five.
[0084] Specific implementation method eight: Combination Figures 1 to 3 In this embodiment, the experience point in step seven is a physical region, and each experience point includes the pose cell activity state P and the visual template V.
[0085] The empirical map is a topologically structured map consisting of individual points called empirical points e. i The physical region is composed of an empirical point, each of which includes the pose cell activity state P and the visual template V. An empirical point can be represented as:
[0086] e i ={P i V i p i}
[0087] Where, p i For experience point e i Location on the experience map.
[0088] The other components and connections in this embodiment are the same as those in specific embodiments one, two, three, four, or five.
[0089] Specific Implementation Method Nine: Combining Figures 1 to 3 In this embodiment, the modified experience map in step eight refers to integrating all the information obtained by the robot through the experience map and updating the map through closed-loop detection, thereby completing the construction of the experience map.
[0090] The revised experience map refers to integrating all the information acquired by the robot through the experience map and updating the map through loop closure detection, thereby completing the construction of the experience map. Specifically, the pose cell encoding and visual template of the current state are compared with the experience points stored in the experience map. If the difference is large enough, a new experience point is created; otherwise, it is determined to be a loop closure point. The comparison formula is:
[0091] S = μ p |P i -P|+μ v |V i -V|
[0092] In the formula, μ p and μ v These are the weights of the pose cell encoding and the visual template, respectively, and are chosen empirically. i V encodes the i-th pose cell stored in the experience map. iThe i-th visual template stored in the experience map; when S is less than a given threshold S MAX When the current scene is confirmed to be a scene that the robot has already experienced, i.e., a loop closure has occurred, the nodes in the map are updated according to the following formula:
[0093]
[0094] In the formula, α is the correction rate constant, and N f N represents the number of connections from experience point i to other experience points. t Let i be the number of connections from other experience points to experience point i.
[0095] Specific Implementation Method Ten: Combining Figures 1 to 3 This embodiment describes the construction of the experience map as follows: the pose cell encoding and visual template in the current state are compared with the experience points stored in the experience map. If the difference is large enough, a new experience point is created; otherwise, it is determined to be a closed loop point.
[0096] The judgment method is as follows:
[0097] S = μ p |P i -P|+μ v |V i -V|
[0098] In the formula, μ p and μ v These are the weights of the pose cell encoding and the visual template, respectively, and are chosen empirically. P i V encodes the i-th pose cell stored in the experience map. i The i-th visual template stored in the experience map; when S is less than a given threshold S MAX When the current scene is confirmed to be a scene that the robot has already experienced, i.e., a loop closure has occurred, the nodes in the map are updated according to the following formula:
[0099]
[0100] In the formula, α is the correction rate constant, and N f N represents the number of connections from experience point i to other experience points. t Let i be the number of connections from other experience points to experience point i.
[0101] Detailed Implementation Method Eleven: Combining Figures 1 to 3 This embodiment explains that the robot is also moving during the scanning of the environment by the motion-compensated sonar. It cannot be assumed that the origin of all beams is at the same position. The real-time pose of the robot relative to each beam must be taken into account.
[0102] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent substitutions, and improvements made to the above embodiments without departing from the scope of the present invention, based on the technical essence of the present invention and within the spirit and principles of the present invention, shall still fall within the protection scope of the present invention.
Claims
1. A biomimetic brain-like synchronous localization and environmental perception method for underwater robots, characterized in that: Includes the following steps: Step 1: Process the sensor information obtained by the acoustic-visual processing method to form a local scene; Step 2: Pre-integrate the IMU data to obtain position, velocity, and rotation increments; Step 3: Feed the processed training dataset into the network for learning to obtain the estimated displacement and estimated heading; Step 4: Update the position and heading, and input the pose cells for pose expression; Step 5: After acoustic and visual processing, the external sensors are used as input to the underwater robot to construct local scene cells; Step Six: Obtain the real-time scene at the current moment through local scenes and match it with previously stored scenes to determine whether to reconstruct the environment; Step 7: Connect the pose cell and the local scene cell at a certain moment to form the experience point at that moment; The experience point is a physical region, and each experience point includes the pose cell activity state P and the visual template V; Step 8: Construct and revise the experience map based on cognitive points; The revised experience map refers to integrating all the information obtained by the robot through the experience map, updating the map through closed-loop detection, thereby completing the construction of the experience map; The construction of the experience map is specifically as follows: the pose cell encoding and visual template in the current state are compared with the experience points stored in the experience map. If the difference is large enough, a new experience point is created; otherwise, it is determined to be a closed loop point. The judgment method is as follows: In the formula, and These are the weights of the pose cell encoding and the visual template, respectively, and are chosen based on experience. Encode the i-th pose cell stored in the experience map. The i-th visual template stored in the experience map; when S is less than a given threshold When the current scene is confirmed to be a scene that the robot has already experienced, i.e., a loop closure has occurred, the nodes in the map are updated according to the following formula: In the formula, To correct the rate constant, Let i be the number of connections from experience point i to other experience points. Let i be the number of connections from other experience points to experience point i.
2. The underwater robot biomimetic brain-like synchronous localization and environmental perception method according to claim 1, characterized in that: The audiovisual processing method in step one specifically includes: processing echo intensity information, suppressing measurement noise, and motion compensation.
3. The underwater robot biomimetic brain-like synchronous localization and environmental perception method according to claim 1, characterized in that: The pre-integration in step two can directly calculate the relative pose between two frames, without depending on the initial value.
4. The underwater robot biomimetic brain-like synchronous localization and environmental perception method according to claim 1, characterized in that: The network in step three includes a displacement estimation network and a heading estimation network. The input of the heading estimation network is used as the tenth dimension of the displacement network input. The robot's motion displacement is estimated using a CNN and an attention network.
5. The underwater robot biomimetic brain-like synchronous localization and environmental perception method according to claim 1, characterized in that: The determination of whether to perform environmental reconstruction in step six is based on comparing point cloud similarity.
6. The underwater robot biomimetic brain-like synchronous localization and environmental perception method according to claim 5, characterized in that: Generating a visual template from point cloud data involves transforming the point cloud data into a coordinate system and then matching the points to obtain the visual template.
7. The underwater robot biomimetic brain-like synchronous localization and environmental perception method according to claim 2, characterized in that: The robot is also moving while the motion-compensated sonar is scanning the environment. It cannot be assumed that the origin of all beams is at the same position. The real-time pose of the robot relative to each beam must be taken into account.