Acousto-optic fusion guiding system and method for autonomous dynamic docking of underwater unmanned system

By employing a segmented adaptive guidance strategy and multi-sensor fusion technology, the problem of single sensor failure in AUV docking guidance was solved, achieving stable docking in turbid water and under multipath effects, thus enhancing the reliability and accuracy of the guidance system.

CN122192284APending Publication Date: 2026-06-12SHANGHAI JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI JIAOTONG UNIV
Filing Date
2026-04-17
Publication Date
2026-06-12

Smart Images

  • Figure CN122192284A_ABST
    Figure CN122192284A_ABST
Patent Text Reader

Abstract

The application discloses an acoustic-optical fusion guiding system and method for autonomous dynamic docking of an underwater unmanned system, and belongs to the technical field of marine engineering equipment and underwater robots, and solves the problems that a single sensor in existing AUV docking guidance is easy to fail in turbid water or under multipath effect, and that a traditional fusion algorithm is difficult to process sensor spatial parallax and lacks semantic perception capability, the system comprises a platform end perception guiding subsystem, an AUV end cooperation subsystem and an intelligent switching control module; the application comprehensively uses USBL, a multi-beam forward-looking sonar or a sparse array imaging sonar, an underwater optical camera or an underwater laser radar multiple heterogeneous sensor, and is supplemented with an underwater electromagnetic positioning device, so that a segmented adaptive underwater unmanned system acoustic-optical fusion guiding technical system from a long distance, a medium distance to a short distance is constructed, and through multi-source information fusion, the reliability, stability and accuracy of autonomous underwater vehicle (AUV) guidance are enhanced.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of marine engineering equipment and underwater robot technology, specifically relating to an acoustic-optical fusion guidance system and method for autonomous dynamic docking of underwater unmanned systems. Background Technology

[0002] With the rapid development of marine resource exploration, seabed topography mapping, and military reconnaissance, autonomous underwater vehicles (AUVs) and other underwater unmanned systems have become core equipment for deep-sea operations due to their flexibility, stealth, and intelligence. However, the endurance of AUVs is limited by their energy and data storage capacity, becoming a key bottleneck restricting their long-term operational stay. To achieve energy replenishment and data exchange for AUVs, underwater docking station technology has emerged, with precise guidance during the dynamic docking process being the core challenge for ensuring the safe recovery of AUVs.

[0003] Currently, AUV docking guidance technology mainly relies on acoustic and optical sensors, but single-mode sensors have significant limitations in complex underwater environments:

[0004] Limitations of acoustic guidance technology: Acoustic sensors such as ultra-short baseline (USBL) positioning systems and multibeam forward-looking sonar (MFLS) have advantages such as long operating range and no illumination limitations, but they are susceptible to shallow water multipath effects and environmental noise interference. In addition, forward-looking sonar images have low resolution and severe speckle noise, and traditional image processing methods struggle to extract accurate target geometric features in complex background clutter.

[0005] Limitations of optical guidance technology: Underwater cameras perform excellently in close-range precision docking, but their effective range is typically only 10-20 meters. In turbid waters, the scattering and absorption of light by suspended particles causes a sharp decrease in image contrast, rendering traditional visual feature point extraction methods ineffective and easily leading to mission failure in the final stage of docking.

[0006] To overcome the limitations of single sensors, multi-sensor fusion has become a mainstream research direction. Existing technologies mainly fall into two categories: fusion schemes based on traditional filtering algorithms and feature fusion schemes based on deep learning. The first category, fusion schemes based on traditional filtering algorithms, typically use extended Kalman filters or federated filters to numerically fuse USBL position data with camera orientation data. These methods suffer from a significant lack of semantic awareness; they are essentially numerical calculations based on geometric models and cannot understand environmental semantics. When sonar images have strong background clutter, or when the camera captures false light sources (such as bioluminescence), the filtering algorithms cannot distinguish between targets and noise. Furthermore, the algorithms do not consider the spatial parallax of sensors installed at the docking station, and direct fusion easily introduces systematic errors. The second category, feature fusion schemes based on deep learning, however, is largely limited to single tasks and lacks systematic design for the specific scenario of docking station guidance. Currently, there is no existing technology that can combine a three-level adaptive switching mode of "far-medium-near" distance, and a complete guidance system that can extract acoustic and optical features using an independent dual-branch network at close range and solve the problem of parallax and turbidity adaptation of dock station sensors through spatial cross attention.

[0007] In summary, to address the issues of single sensor failure in turbid water or multipath effects in existing AUV docking guidance systems, and the difficulty of traditional fusion algorithms in handling sensor spatial parallax and lack of semantic awareness, we propose an acoustic-optical fusion guidance system and method for autonomous dynamic docking of underwater unmanned systems. Summary of the Invention

[0008] The purpose of this invention is to address the shortcomings of existing technologies by providing an acoustic-optical fusion guidance system and method for autonomous dynamic docking of underwater unmanned systems. This solves the problems of single sensors easily failing in turbid water or under multipath effects in existing AUV docking guidance, as well as the difficulty of traditional fusion algorithms in handling sensor spatial parallax and lack of semantic perception capabilities.

[0009] This invention is implemented as follows: an acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems, comprising:

[0010] The platform-side perception and guidance subsystem is deployed on the underwater platform. The platform-side perception and guidance subsystem adopts a segmented adaptive guidance strategy based on relative distance, which divides the docking process into three stages: long distance, medium distance and short distance, and calls the corresponding adaptive guidance method for each stage.

[0011] The AUV end-cooperation subsystem is deployed on the autonomous underwater vehicle and includes a USBL transponder beacon and a high-brightness LED light source cooperation identification device.

[0012] The intelligent switching control module, based on a finite state machine-based intelligent switching control strategy, enables a smooth transition between three working modes: long-range, medium-range, and short-range.

[0013] The platform-side perception and guidance subsystem and the AUV-side coordination subsystem work together through acoustic signal interaction and active and passive optical visual feature matching to complete the docking task.

[0014] Preferably, the platform-side sensing and guidance subsystem includes:

[0015] USBL positioning array is used to receive and process acoustic signals from the AUV end coordination subsystem during long-range phases to obtain the initial position information of the autonomous underwater vehicle.

[0016] Multibeam forward-looking sonar is used to scan the waters ahead at medium and close ranges to acquire two-dimensional acoustic images containing echoes from autonomous underwater vehicles.

[0017] An underwater optical camera is used to acquire optical images of autonomous underwater vehicles at close range.

[0018] The computational control unit calls the corresponding adaptive guidance method for each stage, triggers the execution of the segmented adaptive guidance algorithm, and outputs guidance control commands.

[0019] Preferably, the computational control unit invokes corresponding adaptive guidance methods for each stage, including:

[0020] In the long-range phase, the autonomous underwater vehicle's position coordinates are output smoothly based on USBL positioning data and an adaptive robust Kalman filter algorithm.

[0021] In the mid-range phase, the autonomous underwater vehicle's two-dimensional coordinates are output based on forward-looking sonar images and a lightweight target detection network combined with a Transformer tracking network, guiding the autonomous underwater vehicle to correct its horizontal heading.

[0022] During the close-range phase, a deep fusion network based on acoustic and optical images is used to calculate the relative position of the autonomous underwater vehicle and control it to complete the docking.

[0023] Preferably, the method for outputting smooth autonomous underwater vehicle position coordinates based on USBL positioning data and an adaptive robust Kalman filter algorithm includes:

[0024] Acquire USBL positioning data of autonomous underwater vehicles and establish a state-space model of autonomous underwater vehicles based on the USBL positioning data;

[0025] An IGGIII robustness mechanism is constructed by introducing an IGGIII three-segment equivalent weight function to build a robustness factor. The measurement residuals are robustly processed by the IGGIII three-segment equivalent weight function. When the residuals exceed the elimination threshold, they are judged as outliers and their weights are reduced or they are eliminated.

[0026] By introducing Sage-Husa adaptive estimation, the measurement noise covariance matrix is ​​estimated and corrected in real time online, and the output of smooth autonomous underwater vehicle position coordinates with outliers removed is generated. This drives the autonomous underwater vehicle to adjust its heading angle and depth, so that it can stably approach the sonar range of the underwater platform.

[0027] Preferably, when the autonomous underwater vehicle's two-dimensional coordinates are output based on forward-looking sonar images and a lightweight target detection network combined with a Transformer tracking network, the lightweight target detection network is a standard YOLO lightweight network architecture, which achieves continuous localization of underwater acoustic targets through acoustic target tracking based on a Transformer network model.

[0028] Preferably, the method for outputting the two-dimensional coordinates of an autonomous underwater vehicle based on forward-looking sonar images and a lightweight target detection network combined with a Transformer tracking network includes:

[0029] Real-time acquisition of multi-beam forward-looking sonar images or sparse array imaging sonar images; integration of multi-beam forward-looking sonar images or sparse array imaging sonar images into a two-dimensional acoustic image; and use the two-dimensional acoustic image as the input of a lightweight target detection network.

[0030] End-to-end target detection of two-dimensional acoustic images is performed using a standard YOLO series lightweight network architecture.

[0031] Continuous localization of underwater acoustic targets is achieved through acoustic target tracking based on the Transformer network model;

[0032] Based on the target detection results of the sonar images, the two-dimensional coordinates of the autonomous underwater vehicle in the sonar scanning plane are calculated. This coordinates are used to guide the autonomous underwater vehicle to correct its course in the horizontal plane and approach the entrance of the underwater platform.

[0033] Preferably, the method for calculating the relative position of an autonomous underwater vehicle and controlling its docking using a deep fusion network based on acoustic and optical images includes:

[0034] Acquire RGB optical images from underwater optical cameras, and acoustic images from multibeam forward-looking sonar or sparse array imaging sonar.

[0035] A dual-branch feature fusion network based on spatial cross-attention is adopted to extract heterogeneous features through optical feature branches and acoustic feature branches respectively;

[0036] The correlation matrix of acoustic and optical features is calculated by the spatial cross-attention module. The physical parallax of the sonar and camera is implicitly corrected and complementary enhancement is performed at the feature layer. With the help of the underwater electromagnetic positioning device, the position of the autonomous underwater vehicle relative to the underwater platform is directly calculated, and the autonomous underwater vehicle is controlled to complete the final docking.

[0037] Preferably, the dual-branch feature fusion network includes an optical feature branch and an acoustic feature branch. The optical feature branch uses ResNet50 as the backbone network to extract the high-brightness spot, color texture, and edge contour features of the LED light source at the bow of the autonomous underwater vehicle. The acoustic feature branch uses ResNet18 as the backbone network.

[0038] Preferably, the intelligent switching control strategy of the finite state machine introduces a dual verification mechanism of distance hysteresis interval and detection confidence, and predefines three discrete working states. ,in, These correspond to long-range, medium-range, and short-range modes, respectively, with preset long-to-medium range switching thresholds. Mid-to-near handover threshold And a hysteresis tolerance to prevent critical point jumps. ;

[0039] The intelligent switching control strategy based on finite state machines achieves a smooth transition between long-range, medium-range, and short-range operating modes, including:

[0040] The condition for switching from long-range mode to medium-range mode is: the estimated distance output by the USBL filter is less than the long-to-medium-range switching threshold for N1 consecutive frames. ;

[0041] The condition for switching from medium-range mode to short-range mode is: the sonar calculated range is less than the medium-to-short-range switching threshold. Furthermore, the target detection confidence level is higher than the safety threshold. ;

[0042] The condition for reverting from close-range mode to mid-range mode is: the solution distance of the deep fusion network exceeds [a certain value]. Or the regression confidence level is below the minimum threshold within a continuous time window T. .

[0043] On the other hand, the present invention also provides an acoustic-optical fusion guidance method for autonomous dynamic docking of underwater unmanned systems. Specifically, the acoustic-optical fusion guidance method for autonomous dynamic docking of underwater unmanned systems includes:

[0044] S10, Long-range phase: The platform-side perception and guidance subsystem acquires the acoustic positioning data of the autonomous underwater vehicle through the USBL array, uses an adaptive robust Kalman filter algorithm to suppress multipath noise, outputs smooth autonomous underwater vehicle position coordinates, and guides the autonomous underwater vehicle to approach the underwater platform.

[0045] S20, mid-range phase: Multibeam forward-looking sonar or sparse array imaging sonar is activated, and YOLO series lightweight network is used to identify the target autonomous underwater vehicle from two-dimensional acoustic images. Combined with Transformer tracking network, the two-dimensional coordinates of the autonomous underwater vehicle are output to guide the autonomous underwater vehicle to correct its horizontal heading.

[0046] S30, in the close-range phase, simultaneously activates optical cameras and sonar, employs a dual-branch feature fusion network based on spatial cross-attention, achieves implicit alignment and complementary enhancement of acoustic and optical data at the feature layer, and is supplemented by an underwater electromagnetic positioning device to calculate the relative position of the autonomous underwater vehicle and control the docking to complete the docking.

[0047] S40, throughout the entire guidance process, uses an intelligent switching control strategy based on a finite state machine to achieve smooth transitions between each stage, and triggers a rollback mechanism when an anomaly is detected.

[0048] Compared with the prior art, the embodiments of this application have the following main advantages:

[0049] The embodiments of this invention consist of a platform-side perception and guidance subsystem, an AUV-side coordination subsystem, and an intelligent switching control module. It integrates various heterogeneous sensors, such as USBL, multi-beam forward-looking sonar or sparse array imaging sonar, underwater optical camera or underwater lidar, and is supplemented by an underwater electromagnetic positioning device. It constructs a segmented adaptive underwater unmanned system acoustic-optical fusion guidance technology system covering long-range, medium-range to short-range. Through multi-source information fusion, it enhances the reliability, stability and accuracy of autonomous underwater vehicle (AUV) guidance.

[0050] At long range, the system utilizes only USBL combined with robust Kalman filtering to effectively suppress underwater acoustic multipath noise and guide the autonomous underwater vehicle to approach with minimal energy consumption. At medium range, forward-looking sonar and YOLO series lightweight networks are employed to overcome the reliance on artificial features in traditional methods and quickly lock the target's two-dimensional position under low light conditions. At close range, the system integrates acoustic and optical dual-modal data, which not only compensates for the failure risk of pure vision in turbid water, but also uses high-resolution textures to correct the jitter of acoustic positioning, thereby ensuring a stable docking success rate.

[0051] Secondly, this invention utilizes a finite state machine-based intelligent switching control strategy to achieve a smooth transition between long-range, medium-range, and short-range operating modes, enhancing the dynamic stability of the guidance process. Addressing the tendency for underwater sensor data to fluctuate drastically, this invention abandons simple hard switching based on distance thresholds and introduces a distance hysteresis interval and detection confidence gating mechanism. This not only effectively eliminates control oscillations caused by signal fluctuations at critical distances for the AUV but also endows the system with fault-tolerant protection capabilities. Attached Figure Description

[0052] Figure 1 This is a schematic diagram of the platform-side sensing guidance subsystem provided by the present invention.

[0053] Figure 2 A schematic diagram of the lightweight network architecture of the YOLO series is shown.

[0054] Figure 3 A schematic diagram of a dual-branch feature fusion network architecture is shown.

[0055] Figure 4 A schematic diagram illustrating the principle of the correlation matrix for calculating acoustic-optical features using the spatial cross-attention module is shown.

[0056] Figure 5 A schematic diagram of the implementation process of a method for achieving a smooth transition between three working modes—long-range, medium-range, and short-range—based on a finite state machine-based intelligent switching control strategy is shown.

[0057] Figure 6 A schematic diagram of multi-beam forward-looking sonar image sampling and sonar target tracking is shown in an embodiment of the present invention.

[0058] Figure 7 A schematic diagram of an RGB optical image captured by an underwater optical camera in an embodiment of the present invention is shown. Detailed Implementation

[0059] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.

[0060] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0061] To address the issues of single-sensor failure in turbid waters or under multipath effects in existing AUV docking guidance systems, and the difficulty of traditional fusion algorithms in handling sensor spatial parallax and lack of semantic perception capabilities, we propose an acoustic-optical fusion guidance system and method for autonomous dynamic docking of underwater unmanned systems. In short, the acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems consists of a platform-side perception and guidance subsystem, an AUV-side coordination subsystem, and an intelligent switching control module. The platform-side perception and guidance subsystem and the AUV-side coordination subsystem interact through acoustic signals and work collaboratively through active and passive optical-visual feature matching to jointly complete the docking (underwater platform and autonomous underwater vehicle) task. The embodiments of this invention consist of a platform-side perception and guidance subsystem, an AUV-side coordination subsystem, and an intelligent switching control module. It integrates various heterogeneous sensors, such as USBL, multi-beam forward-looking sonar or sparse array imaging sonar, underwater optical camera or underwater lidar, and is supplemented by an underwater electromagnetic positioning device. It constructs a segmented adaptive underwater unmanned system acoustic-optical fusion guidance technology system covering long-range, medium-range to short-range. Through multi-source information fusion, it enhances the reliability, stability and accuracy of autonomous underwater vehicle (AUV) guidance.

[0062] Example 1

[0063] This invention provides an acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems. Specifically, the acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems includes:

[0064] The platform-side perception and guidance subsystem is deployed on the underwater platform. The platform-side perception and guidance subsystem adopts a segmented adaptive guidance strategy based on relative distance, which divides the docking process into three stages: long distance, medium distance and short distance, and calls the corresponding adaptive guidance method for each stage.

[0065] The AUV end-cooperation subsystem is deployed on the autonomous underwater vehicle and includes a USBL transponder beacon and a high-brightness LED light source cooperation identification device.

[0066] The intelligent switching control module, based on a finite state machine-based intelligent switching control strategy, enables a smooth transition between three working modes: long-range, medium-range, and short-range.

[0067] The platform-side perception and guidance subsystem and the AUV-side coordination subsystem work together through acoustic signal interaction and active and passive optical visual feature matching to complete the docking task.

[0068] In this embodiment, as Figure 1 As shown, the platform-side perception and guidance subsystem is the core of the entire guidance mission's decision-making and perception. It is deployed in the entrance area of ​​the underwater platform, which is located below surface ships or docks and is used to recover autonomous underwater vehicles (AUVs). Specifically, a USBL (Ultra-Short Baseline) positioning array is installed on the underwater platform to receive acoustic signals emitted by the AUV at long range and calculate its relative slant range and azimuth. At the entrance of the underwater platform, a multibeam forward-looking sonar is installed, with its transducer array facing forward. It continuously scans the fan-shaped water area in front at close to medium range to acquire acoustic images containing AUV echoes. Simultaneously, an underwater optical camera is also installed at the entrance of the underwater platform to capture the visual texture features of the AUV at close range.

[0069] The AUV-side coordination subsystem, acting as the guided object, is primarily responsible for responding to positioning inquiries and providing enhanced optical features to assist the platform's detection. A USBL transponder beacon is installed in the AUV's communication section. Upon receiving an interrogation pulse from the platform's USBL positioning array, the beacon immediately replies with an acoustic signal at a specific frequency, establishing a long-range acoustic communication link. Simultaneously, a high-brightness LED light source is installed at the bow of the AUV. This light source serves as a visual cooperation marker, activated during close-range guidance. It provides a high-contrast light spot feature in turbid water, assisting the platform's camera in quickly locking onto the AUV target in low-light conditions, thus compensating for insufficient natural light. It should be noted that the AUV-side coordination subsystem also includes at least one set of thrusters, an inertial navigation system, a controller, a power unit, a propeller, and a communication antenna, all of which are installed within the autonomous underwater vehicle.

[0070] The embodiments of this invention consist of a platform-side perception and guidance subsystem, an AUV-side coordination subsystem, and an intelligent switching control module. It integrates various heterogeneous sensors, such as USBL, multi-beam forward-looking sonar or sparse array imaging sonar, underwater optical camera or underwater lidar, and is supplemented by an underwater electromagnetic positioning device. It constructs a segmented adaptive underwater unmanned system acoustic-optical fusion guidance technology system covering long-range, medium-range to short-range. Through multi-source information fusion, it enhances the reliability, stability and accuracy of autonomous underwater vehicle (AUV) guidance.

[0071] In this embodiment of the invention, the platform-side sensing guidance subsystem includes:

[0072] USBL positioning array is used to receive and process acoustic signals from the AUV end coordination subsystem during long-range phases to obtain the initial position information of the autonomous underwater vehicle.

[0073] Multibeam forward-looking sonar is used to scan the waters ahead at medium and close ranges to acquire two-dimensional acoustic images containing echoes from autonomous underwater vehicles.

[0074] An underwater optical camera is used to acquire optical images of autonomous underwater vehicles at close range.

[0075] The computational control unit calls the corresponding adaptive guidance method for each stage, triggers the execution of the segmented adaptive guidance algorithm, and outputs guidance control commands.

[0076] At long range, the system utilizes only USBL combined with robust Kalman filtering to effectively suppress underwater acoustic multipath noise and guide the autonomous underwater vehicle to approach with minimal energy consumption. At medium range, forward-looking sonar and YOLO series lightweight networks are employed to overcome the reliance on artificial features in traditional methods and quickly lock the target's two-dimensional position under low light conditions. At close range, the system integrates acoustic and optical dual-modal data, which not only compensates for the failure risk of pure vision in turbid water, but also uses high-resolution textures to correct the jitter of acoustic positioning, thereby ensuring a stable docking success rate.

[0077] Secondly, this invention utilizes a finite state machine-based intelligent switching control strategy to achieve a smooth transition between long-range, medium-range, and short-range operating modes, enhancing the dynamic stability of the guidance process. Addressing the tendency for underwater sensor data to fluctuate drastically, this invention abandons simple hard switching based on distance thresholds and introduces a distance hysteresis interval and detection confidence gating mechanism. This not only effectively eliminates control oscillations caused by signal fluctuations at critical distances for the AUV but also endows the system with fault-tolerant protection capabilities.

[0078] Finally, this invention utilizes a deep network based on spatial cross-attention to achieve implicit alignment and complementary enhancement of acoustic and optical data in the feature dimension without requiring complex external joint calibration of sonar and camera. This method enables the system to accurately identify weakly textured autonomous underwater vehicle targets from a strong reverberant background, effectively reducing the false alarm rate and demonstrating significant engineering application value.

[0079] This invention provides a computational control unit that invokes corresponding adaptive guidance methods for each stage. Specifically, the invocation of these methods by the computational control unit for each stage includes:

[0080] In the long-range phase, the autonomous underwater vehicle's position coordinates are output smoothly based on USBL positioning data and an adaptive robust Kalman filter algorithm.

[0081] In the mid-range phase, the autonomous underwater vehicle's two-dimensional coordinates are output based on forward-looking sonar images and a lightweight target detection network combined with a Transformer tracking network, guiding the autonomous underwater vehicle to correct its horizontal heading.

[0082] During the close-range phase, a deep fusion network based on acoustic and optical images is used to calculate the relative position of the autonomous underwater vehicle and control it to complete the docking.

[0083] In this embodiment, the long-distance threshold can be ≥100m, the medium-distance threshold can be: 100m > medium-distance threshold ≥20m, and the short-distance threshold can be <20m.

[0084] During the long-range guidance phase, the underwater platform can only acquire the AUV's acoustic positioning data through the USBL array. To address the problem of frequent outliers in the observation data and unstable statistical characteristics of measurement noise caused by strong multipath effects in shallow water, this invention employs an adaptive robust Kalman filter algorithm. This algorithm integrates robust estimation and Sage-Husa adaptive filtering. Specifically, this invention provides a method for obtaining smooth AUV position coordinates based on USBL positioning data and the adaptive robust Kalman filter algorithm. This method for obtaining smooth AUV position coordinates based on USBL positioning data and the adaptive robust Kalman filter algorithm includes:

[0085] S101, acquire USBL positioning data of autonomous underwater vehicle, and establish state space model of autonomous underwater vehicle based on USBL positioning data;

[0086] When establishing the state-space model of the autonomous underwater vehicle (AUV), the state vector of the AUV in the underwater platform coordinate system is set as follows: ,in For position coordinates, Let be the velocity component. Assuming the AUV undergoes approximately uniform linear motion during this phase, establish the state equations and measurement equations for the discrete system:

[0087]

[0088] in, Here is the state transition matrix. This is the original position observation vector calculated by the USBL array. For the measurement matrix, and These are process noise and measurement noise, respectively.

[0089] S102, Construct the IGGIII robustness mechanism, introduce the IGGIII three-segment equivalent weight function to construct the robustness factor, and use the IGGIII three-segment equivalent weight function to perform robustness processing on the measurement residuals. When the residuals exceed the elimination threshold, they are judged as outliers and their weights are reduced or they are eliminated.

[0090] In this embodiment, to eliminate anomalous jumps in data caused by multipath effects, an IGGIII three-segment equivalent weight function is introduced to construct a robust factor. First, the predicted residual vector is calculated. The standardization process is then performed to obtain the standardized residuals. The equivalent covariance matrix of the current measurement value is dynamically adjusted based on the magnitude of the standardized residuals. :

[0091]

[0092] in, To protect the threshold, This is the elimination threshold. The explanation of the formula for calculating the equivalent covariance matrix is ​​as follows:

[0093] ① Normal segment: When the residual is less than the preservation threshold When the data is considered reliable, the original weights are maintained.

[0094] ② Suspicious segment: When the residual is between and When there is a slight disturbance, the covariance of the measurement is increased by a weighting factor.

[0095] ③ Abnormal segment: When the residual is greater than When an outlier is identified, its weight is reset to zero, and the predicted value is used to replace the observed value to prevent the filter from diverging.

[0096] S103 introduces Sage-Husa adaptive estimation to perform real-time online estimation and correction of the measurement noise covariance matrix, outputting smooth autonomous underwater vehicle position coordinates with outliers removed, driving the autonomous underwater vehicle to adjust its heading angle and depth, so that it stably approaches the sonar range of the underwater platform.

[0097] In this embodiment, to address the problem of unknown or time-varying noise statistical characteristics due to the complex and variable underwater acoustic environment, the Sage-Husa estimator is used to... Perform real-time online estimation and correction:

[0098]

[0099] in, Forgetting factor, Forgetting rate, This is the prediction error covariance matrix.

[0100] After the above processing, the output is a smoothed AUV position coordinate with outliers removed. This data is directly input to the AUV controller, driving the AUV to adjust its heading angle and depth, making it stably approach the sonar range of the underwater platform.

[0101] During the mid-range guidance phase, the system activates multibeam forward-looking sonar or sparse array imaging sonar, and uses the YOLO series lightweight network to quickly identify the target AUV from the two-dimensional acoustic image. When the two-dimensional coordinates of the autonomous underwater vehicle are output based on the forward-looking sonar image and the lightweight target detection network combined with the Transformer tracking network, the lightweight target detection network is a standard YOLO lightweight network architecture. Through acoustic target tracking based on the Transformer network model, continuous positioning of underwater acoustic targets is achieved.

[0102] This invention provides a method for outputting the two-dimensional coordinates of an autonomous underwater vehicle (AUV) based on forward-looking sonar images, a lightweight target detection network, and a Transformer tracking network. Specifically, this method includes:

[0103] S201 acquires multi-beam forward-looking sonar images or sparse array imaging sonar images in real time, integrates the multi-beam forward-looking sonar images or sparse array imaging sonar images into a two-dimensional acoustic image, and uses the two-dimensional acoustic image as the input of a lightweight target detection network.

[0104] S202 uses a standard YOLO series lightweight network architecture to perform end-to-end target detection on two-dimensional acoustic images;

[0105] in, Figure 2 The diagram shows a schematic of the lightweight network architecture of the YOLO series. The lightweight network of the YOLO series includes a backbone network, a neck network, at least one set of parallel detection heads, categories and bounding boxes, and a result output layer. The backbone network is used to acquire two-dimensional acoustic images. The backbone network is connected to the neck network, the neck network is connected to the parallel detection heads, and the parallel detection heads are connected to the categories and bounding boxes.

[0106] S203 achieves continuous localization of underwater acoustic targets through acoustic target tracking based on the Transformer network model;

[0107] Among them, the YOLO series of lightweight networks outputs the predicted bounding boxes of AUVs in sonar images. Then, extract its geometric center pixel coordinates. Based on the imaging geometry model of the forward-looking sonar, the pixel coordinates are inverted into two-dimensional planar physical coordinates of the AUV relative to the sonar transducer. The maximum physical range of the sonar is set to... The horizontal fan opening angle is The image width is Image height is ;

[0108] The slant distance is calculated as follows:

[0109]

[0110] The azimuth angle is calculated as follows:

[0111]

[0112] The two-dimensional physical coordinates are calculated as follows:

[0113]

[0114] The algorithm ultimately outputs the two-dimensional coordinates of the AUV in the sonar scanning plane. This data is used to guide the AUV to correct its course in the horizontal plane and approach the underwater platform entrance.

[0115] S204. Based on the target detection results of the sonar image, calculate the two-dimensional coordinates of the autonomous underwater vehicle in the sonar scanning plane system. This coordinates are used to guide the autonomous underwater vehicle to correct its course in the horizontal plane and approach the entrance of the underwater platform.

[0116] During the close-range guidance phase, the system enters a precision alignment mode, where the underwater platform simultaneously acquires RGB images from an underwater optical camera and acoustic images from a multibeam forward-looking sonar or sparse array imaging sonar. To address the issues of spatial parallax in heterogeneous sensors and the failure of a single visual system due to water turbidity, this invention provides a method for calculating the relative position of an autonomous underwater vehicle (AUV) and controlling its docking based on a deep fusion network of acoustic and optical images. This method specifically includes:

[0117] S301 acquires RGB optical images from underwater optical cameras and acoustic images from multibeam forward-looking sonar or sparse array imaging sonar.

[0118] S302 employs a dual-branch feature fusion network based on spatial cross-attention, extracting heterogeneous features through optical feature branches and acoustic feature branches respectively;

[0119] In this embodiment of the invention, Figure 3A schematic diagram of a dual-branch feature fusion network architecture is shown. The dual-branch feature fusion network includes an optical feature branch and an acoustic feature branch. The optical feature branch uses ResNet50 as the backbone network to extract the high-brightness spot, color texture, and edge contour features of the LED light source at the bow of the autonomous underwater vehicle. The acoustic feature branch uses ResNet18 as the backbone network.

[0120] Specifically, the processing flow of the method for calculating the relative position of an autonomous underwater vehicle and controlling its docking based on a deep fusion network of acoustic and optical images first involves independent feature extraction using a two-stream network. The system then processes the pre-processed optical images... Harmony and acoustic images The input is fed into two independent optical feature branches and an acoustic feature branch, each with non-shared parameters. The optical feature branch uses ResNet50 as its backbone network, focusing on extracting the high-brightness spot, color texture, and edge contour features of the AUV's bow LED light source. The image first passes through a... The process involves convolutional layers (with a stride of 2), batch normalization, and ReLU activation, followed by downsampling through max pooling layers. The result is then passed through the four residual stages of ResNet50, and finally utilized... Convolution adjusts the number of channels and generates optical feature maps The acoustic feature branch uses the more compact ResNet18 as its backbone network. The input image passes through four residual stages of ResNet18 sequentially. Unlike the optical branch, this branch removes the max pooling operation in the first layer to preserve the geometric edge information of small targets in the sonar image. Finally, it utilizes... Convolution adjusts the number of channels and generates optical feature maps .

[0121] S303 calculates the correlation matrix of acoustic and optical features through a spatial cross-attention module, implicitly corrects the physical parallax of sonar and camera at the feature layer and performs complementary enhancement, and with the help of an underwater electromagnetic positioning device, directly calculates the position of the autonomous underwater vehicle relative to the underwater platform, and controls the autonomous underwater vehicle to complete the final docking.

[0122] In this embodiment, Figure 4 This diagram illustrates the principle of the spatial cross-attention module in calculating the correlation matrix of acoustic-optic features. Specifically, the extracted bimodal features are fed into the spatial cross-attention module for interactive fusion. The aim is to achieve soft alignment of acoustic-optic information within the feature space. First, the optical feature map is mapped to a query vector through a linear projection layer. The acoustic feature maps are mapped to key vectors respectively. Sum value vector The mapping calculation formula is as follows:

[0123]

[0124] in, , , This is a learnable weight matrix.

[0125] Next, the spatial cross-attention module calculates the spatial correlation between the optical query vector and the acoustic key vector, generating an attention weight matrix. This matrix characterizes the degree of association between each pixel region in the optical image and all regions in the acoustic image. If water turbidity blurs the optical features, but the sonar has a clear echo at the corresponding physical location, this attention mechanism automatically assigns it a higher weight. (Attention weight matrix) The calculation formula is as follows:

[0126]

[0127] in, This is a scaling factor for the feature channel dimension, used to prevent the gradient from vanishing due to excessively large dot product results.

[0128] After obtaining the attention map, the weight matrix is ​​used to apply the acoustic value vector. Weighted aggregation is performed to obtain spatially aligned and enhanced acoustic features, which are then injected into the original optical features via residual connections to generate the final fused features. This process implicitly corrects for the physical parallax between the sonar and camera on the underwater platform. The fusion calculation formula is as follows:

[0129]

[0130] Finally, the system performs location regression. The fused feature maps are then processed. The AUV's position coordinates relative to the underwater platform are directly calculated by flattening the vector into a one-dimensional feature vector using a global average pooling layer and inputting it into a regression head composed of a multilayer sensing mechanism. :

[0131] .

[0132] In a further preferred embodiment of the present invention, to ensure a smooth and reliable transition between the guidance system's three operating modes—long-range, medium-range, and short-range—and to prevent control command oscillations caused by sensor measurement noise or sudden environmental changes, the present invention provides an intelligent switching control strategy using a finite state machine. This strategy does not rely solely on instantaneous distance values ​​for hard switching; instead, it introduces a dual verification mechanism of distance hysteresis interval and detection confidence level gating. Specifically, the intelligent switching control strategy using a finite state machine incorporates this dual verification mechanism and predefines three discrete operating states. ,in, These correspond to long-range, medium-range, and short-range modes, respectively, with preset long-to-medium range switching thresholds. Mid-to-near handover threshold And a hysteresis tolerance to prevent critical point jumps. , Figure 5 A schematic diagram illustrates the implementation process of a method for achieving a smooth transition between long-range, medium-range, and short-range operating modes using an intelligent switching control strategy based on a finite state machine. The method specifically includes:

[0133] The condition for switching from long-range mode to medium-range mode is: the estimated distance output by the USBL filter is less than the long-to-medium-range switching threshold for N1 consecutive frames. ;like Figure 5 As shown, the system predefines three discrete operating states: For long-range guidance states that rely solely on USBL adaptive filtering, For mid-range guidance states relying on forward-looking sonar YOLO series lightweight network models. This is for near-range guidance based on a dual-branch feature fusion network. The system presets two key distance switching thresholds: a far-to-mid-range switching threshold. and mid-to-near handover threshold And a hysteresis tolerance to prevent critical point jumps. .

[0134] The condition for switching from medium-range mode to short-range mode is: the sonar calculated range is less than the medium-to-short-range switching threshold. Furthermore, the target detection confidence level is higher than the safety threshold. ;

[0135] The condition for reverting from near-range mode to mid-range mode is: the solution distance of the deep fusion network (i.e., the dual-branch feature fusion network) exceeds [a certain value]. Or the regression confidence level is below the minimum threshold within a continuous time window T. .

[0136] In this embodiment, during the process of the AUV entering the underwater platform, from state Switch to The decision logic is primarily based on the smoothed distance calculated by USBL and its time-domain stability. The estimated distance output by the adaptive Kalman filter is determined if and only if... Less than And this condition is continuous The system will only activate the forward-looking sonar and enter mid-range mode when the condition is met continuously throughout a processing frame. This forward switching determination function... It can be represented as:

[0137]

[0138] in, This is an indicator function; it takes the value 1 when the condition is met, and 0 otherwise. This refers to the current moment.

[0139] When the system is in As the state continues to approach the dock, from state Switch to The judgment logic is more stringent, requiring both distance and perceptual confidence conditions to be met simultaneously. This not only demands that the physical distance calculated by the YOLO series of lightweight network algorithms on the sonar image be accurate... Less than It also requires the target detection confidence level output by the YOLO series lightweight networks. It must be higher than the preset safety threshold. This logic ensures that the computationally intensive dual-branch feature fusion network is only activated when the sonar has clearly locked onto the target and there is no severe multipath interference. Its joint decision-making logic is as follows:

[0140]

[0141] Furthermore, to address anomalies such as temporary AUV retreat due to underwater disturbances or loss of sensor characteristics, this invention incorporates a reverse backoff logic with hysteresis. If the AUV moves far from the underwater platform, the system will not immediately degrade; instead, it will only degrade when the measured distance exceeds the sum of a threshold and the hysteresis tolerance (i.e., ...). or State rollback is only executed when [the condition is met]. Also, [at the same time]... In this stage, if the feature regression confidence of the dual-branch feature fusion network... continuous Seconds below the minimum threshold The system will trigger a forced circuit breaker mechanism and immediately downgrade to [previous state]. In this state, the position is maintained using sonar with stronger anti-interference capabilities. The abnormal backoff judgment logic can be expressed as follows:

[0142]

[0143] in, Indicates the time window for making a decision The total number of image frames acquired and processed by the internal system.

[0144] By employing the above strategies, the system can maximize its engineering robustness in complex underwater acoustic environments while ensuring docking accuracy.

[0145] This embodiment constructs an acoustic-optical fusion guidance system and method for autonomous dynamic docking of underwater unmanned systems. Its physical environment mainly consists of a platform-side perception and guidance subsystem and an AUV-side coordination subsystem. In terms of hardware deployment, the underwater platform, as the sensing entity, has a USBL positioning array installed in its open top area to establish a long-range underwater acoustic communication link. A multi-beam forward-looking sonar is installed above the platform entrance to acquire two-dimensional acoustic images of the forward fan-shaped area at medium to close range. Simultaneously, an underwater optical camera is installed above the platform entrance to capture visual features at close range. All the above sensors are connected to a computer via watertight cables, which houses the algorithm module described in this invention. On the AUV side, a USBL transponder beacon with a frequency matched to that of the platform is installed in its dorsal communication section, and an LED light source is integrated in the bow to assist visual recognition.

[0146] After the guidance mission begins, the system first enters the long-range navigation phase. When the AUV is more than 100 meters away from the dock, the USBL positioning array of the platform-side sensing and guidance subsystem periodically transmits interrogation signals, and the autonomous underwater vehicle responds immediately upon receiving the USBL response beacon. After receiving the raw acoustic signals, the computational control unit calculates the AUV's slant range, azimuth, and elevation angle. To address the data jump problem caused by multipath effects in shallow water, the system runs an adaptive robust Kalman filter algorithm. This algorithm uses the IGGIII equivalent weight function to construct a robust factor, automatically reducing the weight of outliers with excessively large residuals, and uses the Sage-Husa estimator to estimate the measurement noise covariance matrix in real time, thereby outputting the smoothed AUV position coordinates, which are then used to guide the AUV toward the dock via an underwater acoustic communicator.

[0147] When the filtered USBL distance is less than 100 meters and remains stable for 5 consecutive frames, the system... Figure 5 The switching logic shown enters the mid-range guidance phase, at which point the multi-beam forward-looking sonar is activated. The calculation and control unit operates as follows: Figure 2 The YOLO series lightweight network shown. The system acquires sonar echo data in real time and generates data such as... Figure 6 The two-dimensional acoustic image shown, in which, Figure 6 This diagram illustrates multi-beam forward-looking sonar image sampling and sonar target tracking in an embodiment of the present invention. Figure 6 Image (a) in the diagram is a schematic diagram of multibeam forward-looking sonar image sampling. Figure 6 (b) in the diagram is a schematic diagram of sonar target tracking. Figure 6The image demonstrates the bright echo and acoustic shadowing characteristics of an AUV. The two-dimensional acoustic image is input into a standard YOLO series lightweight network. In this embodiment, the standard YOLO series lightweight network can be the YOLOv8 lightweight network. This network utilizes the CSPDarknet53 backbone network to extract features, effectively identifying the acoustic characteristics of the AUV. After the network outputs the two-dimensional bounding box of the AUV in the sonar image, it combines the sonar physical range and beam opening angle to calculate the two-dimensional planar coordinates of the AUV relative to the recovery platform. The AUV uses these coordinates to correct its horizontal heading deviation.

[0148] like Figure 5 As shown, when the sonar calculation distance is less than 20 meters and the YOLO series lightweight network detection confidence score is greater than 0.4-0.8 for three consecutive frames, the system determines that the close-range guidance conditions are met and simultaneously activates the underwater optical camera and forward-looking sonar for identification. At this time, the computing and control unit operates as follows: Figure 3 The dual-branch feature fusion network shown is an example. Figure 7 This diagram illustrates an RGB optical image captured by an underwater optical camera in an embodiment of the present invention. Figure 7 The image shows the characteristics of the high-brightness LED spot on the bow of an AUV in turbid water. Specifically, the optical feature branch is processed using a ResNet50 backbone network. Figure 7 The RGB optical image shown is used to extract visual features including LED light spots; the acoustic feature branch uses a ResNet18 backbone network with the first-layer pooling removed to process the sonar image. Subsequently, as... Figure 4 The spatial cross-attention module, as shown, plays a crucial role. It maps optical features to query vectors and acoustic features to key vectors, calculating the acoustic-optical spatial correlation matrix. Even when water turbidity blurs the visual light spots, this correlation matrix assigns high weights based on the clear acoustic echo locations, thus completing and enhancing the visual features with acoustic features. Finally, the fused features are processed by global pooling and a multilayer perceptron regression head, directly outputting the AUV's position coordinates relative to the recovery platform. This drives the AUV to complete the final recovery.

[0149] Throughout the guidance process, the system strictly followed Figure 5The system employs an intelligent switching control strategy. It also incorporates a distance hysteresis mechanism to prevent control oscillations caused by measurement noise at critical distances of 100 meters or 20 meters. Specifically, during the close-range acoustic-optical fusion phase, if the regression confidence of the dual-branch feature fusion network falls below a preset safety threshold (e.g., 0.2) for two consecutive seconds, the system will trigger an abnormal circuit breaker, immediately degrading to mid-range mode. At this point, the system reuses the results from the forward-looking sonar's YOLO series lightweight network to maintain the AUV's current position. After replanning the path, it attempts to switch back to close-range mode, thus ensuring the safety and robustness of the entire operation.

[0150] Example 2

[0151] This invention also provides an acoustic-optical fusion guidance method for autonomous dynamic docking of underwater unmanned systems. Specifically, this method includes:

[0152] S10, Long-range phase: The platform-side perception and guidance subsystem acquires the acoustic positioning data of the autonomous underwater vehicle through the USBL array, uses an adaptive robust Kalman filter algorithm to suppress multipath noise, outputs smooth autonomous underwater vehicle position coordinates, and guides the autonomous underwater vehicle to approach the underwater platform.

[0153] S20, mid-range phase: Multibeam forward-looking sonar or sparse array imaging sonar is activated, and YOLO series lightweight network is used to identify the target autonomous underwater vehicle from two-dimensional acoustic images. Combined with Transformer tracking network, the two-dimensional coordinates of the autonomous underwater vehicle are output to guide the autonomous underwater vehicle to correct its horizontal heading.

[0154] S30, in the close-range phase, simultaneously activates optical cameras and sonar, employs a dual-branch feature fusion network based on spatial cross-attention, achieves implicit alignment and complementary enhancement of acoustic and optical data at the feature layer, and is supplemented by an underwater electromagnetic positioning device to calculate the relative position of the autonomous underwater vehicle and control the docking to complete the docking.

[0155] S40, throughout the entire guidance process, uses an intelligent switching control strategy based on a finite state machine to achieve smooth transitions between each stage, and triggers a rollback mechanism when an anomaly is detected.

[0156] This invention constructs a three-stage adaptive guidance strategy based on relative distance: far-field USBL filtering guidance, mid-field sonar target feature guidance, and near-field acoustic-optical depth fusion guidance. This strategy avoids complex and ineffective calculations over long distances while ensuring the accuracy of end-point docking. Furthermore, it employs a dual-branch feature fusion network based on a spatial cross-attention mechanism to calculate the spatial correlation between acoustic and optical image information at the feature layer. This implicitly corrects the physical parallax between the sonar and camera without complex external calibration, achieving precise alignment of heterogeneous features. Simultaneously, by establishing a complementary acoustic-optical feature enhancement mechanism, this invention can automatically guide the network to lock onto the target using clear acoustic feature weights when visual features are missing due to turbid water or uneven lighting. This significantly improves the docking success rate of the system in complex underwater acoustic channels and low-visibility environments. The embodiments of this invention are also applicable to underwater unmanned systems such as remotely operated vehicles (ROVs).

[0157] In summary, this invention provides an acoustic-optical fusion guidance system and method for autonomous dynamic docking of underwater unmanned systems. The embodiments of this invention consist of a platform-side perception and guidance subsystem, an AUV-side coordination subsystem, and an intelligent switching control module. It integrates various heterogeneous sensors, including USBL, multi-beam forward-looking sonar or sparse array imaging sonar, underwater optical cameras, or underwater lidar, supplemented by an underwater electromagnetic positioning device. This constructs a segmented adaptive acoustic-optical fusion guidance technology system for long-range, medium-range, and short-range underwater unmanned systems. Through multi-source information fusion, the reliability, stability, and accuracy of autonomous underwater vehicle (AUV) guidance are enhanced.

[0158] It should be noted that, for the sake of simplicity, the foregoing embodiments are all described as a series of actions. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to the present invention. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the present invention.

[0159] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit the scope of protection of the invention. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on these embodiments, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can still combine, add, delete, or otherwise adjust the features of the various embodiments of the present invention according to the circumstances without conflict or creative effort, thereby obtaining different technical solutions that do not fundamentally depart from the concept of the present invention. These technical solutions also fall within the scope of protection of the present invention.

Claims

1. An acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems, characterized in that: include: The platform-side perception and guidance subsystem is deployed on the underwater platform. The platform-side perception and guidance subsystem adopts a segmented adaptive guidance strategy based on relative distance, which divides the docking process into three stages: long distance, medium distance and short distance, and calls the corresponding adaptive guidance method for each stage. The AUV end-cooperation subsystem is deployed on the autonomous underwater vehicle and includes a USBL transponder beacon and a high-brightness LED light source cooperation identification device. The intelligent switching control module, based on a finite state machine-based intelligent switching control strategy, enables a smooth transition between three working modes: long-range, medium-range, and short-range. The platform-side sensing and guidance subsystem and the AUV-side coordination subsystem interact through acoustic signals to jointly complete the docking task.

2. The acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in claim 1, characterized in that: The platform-side sensing guidance subsystem includes: USBL positioning array is used to receive and process acoustic signals from the AUV end coordination subsystem during long-range phases to obtain the initial position information of the autonomous underwater vehicle. Multibeam forward-looking sonar is used to scan the waters ahead at medium and close ranges to acquire two-dimensional acoustic images containing echoes from autonomous underwater vehicles. An underwater optical camera is used to acquire optical images of autonomous underwater vehicles at close range. The computational control unit calls the corresponding adaptive guidance method for each stage, triggers the execution of the segmented adaptive guidance algorithm, and outputs guidance control commands.

3. The acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in claim 2, characterized in that: The computational control unit invokes corresponding adaptive guidance methods for each stage, including: In the long-range phase, the autonomous underwater vehicle's position coordinates are output smoothly based on USBL positioning data and an adaptive robust Kalman filter algorithm. In the mid-range phase, the autonomous underwater vehicle's two-dimensional coordinates are output based on forward-looking sonar images and a lightweight target detection network combined with a Transformer tracking network, guiding the autonomous underwater vehicle to correct its horizontal heading. During the close-range phase, a deep fusion network based on acoustic and optical images is used to calculate the relative position of the autonomous underwater vehicle and control it to complete the docking.

4. The acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in claim 3, characterized in that: The method for outputting smooth autonomous underwater vehicle position coordinates based on USBL positioning data and an adaptive robust Kalman filter algorithm includes: Acquire USBL positioning data of autonomous underwater vehicles and establish a state-space model of autonomous underwater vehicles based on the USBL positioning data; An IGGIII robustness mechanism is constructed by introducing an IGGIII three-segment equivalent weight function to build a robustness factor. The measurement residuals are robustly processed by the IGGIII three-segment equivalent weight function. When the residuals exceed the elimination threshold, they are judged as outliers and their weights are reduced or they are eliminated. By introducing Sage-Husa adaptive estimation, the measurement noise covariance matrix is ​​estimated and corrected in real time online, and the output of smooth autonomous underwater vehicle position coordinates with outliers removed is generated. This drives the autonomous underwater vehicle to adjust its heading angle and depth, so that it can stably approach the sonar range of the underwater platform.

5. The acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in claim 3, characterized in that: When the autonomous underwater vehicle outputs two-dimensional coordinates based on forward-looking sonar images and a lightweight target detection network combined with a Transformer tracking network, the lightweight target detection network is a standard YOLO lightweight network architecture. Through acoustic target tracking based on the Transformer network model, continuous localization of underwater acoustic targets is achieved.

6. The acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in claim 5, characterized in that: The method for outputting the two-dimensional coordinates of an autonomous underwater vehicle based on forward-looking sonar images and a lightweight target detection network combined with a Transformer tracking network includes: Real-time acquisition of multi-beam forward-looking sonar images or sparse array imaging sonar images; integration of multi-beam forward-looking sonar images or sparse array imaging sonar images into a two-dimensional acoustic image; and use the two-dimensional acoustic image as the input of a lightweight target detection network. End-to-end target detection of two-dimensional acoustic images is performed using a standard YOLO series lightweight network architecture. Continuous localization of underwater acoustic targets is achieved through acoustic target tracking based on the Transformer network model; Based on the target detection results of the sonar images, the two-dimensional coordinates of the autonomous underwater vehicle in the sonar scanning plane are calculated. This coordinates are used to guide the autonomous underwater vehicle to correct its course in the horizontal plane and approach the entrance of the underwater platform.

7. The acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in claim 3, characterized in that: The method for calculating the relative position of an autonomous underwater vehicle and controlling its docking using a deep fusion network based on acoustic and optical images includes: Acquire RGB optical images from underwater optical cameras, and acoustic images from multibeam forward-looking sonar or sparse array imaging sonar. A dual-branch feature fusion network based on spatial cross-attention is adopted to extract heterogeneous features through optical feature branches and acoustic feature branches respectively; The correlation matrix of acoustic and optical features is calculated by the spatial cross-attention module. The physical parallax of the sonar and camera is implicitly corrected and complementary enhancement is performed at the feature layer. With the help of the underwater electromagnetic positioning device, the position of the autonomous underwater vehicle relative to the underwater platform is directly calculated, and the autonomous underwater vehicle is controlled to complete the final docking.

8. The acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in claim 7, characterized in that: The dual-branch feature fusion network includes an optical feature branch and an acoustic feature branch. The optical feature branch uses ResNet50 as the backbone network to extract the high-brightness spot, color texture and edge contour features of the LED light source at the bow of the autonomous underwater vehicle. The acoustic feature branch uses ResNet18 as the backbone network.

9. The acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in claim 3, characterized in that: The intelligent switching control strategy of the finite state machine introduces a dual verification mechanism of distance hysteresis interval and detection confidence, and predefines three discrete working states. ,in, These correspond to long-range, medium-range, and short-range modes, respectively, with preset long-to-medium range switching thresholds. Mid-to-near handover threshold And a hysteresis tolerance to prevent critical point jumps. ; The intelligent switching control strategy based on finite state machines achieves a smooth transition between long-range, medium-range, and short-range operating modes, including: The condition for switching from long-range mode to medium-range mode is: the estimated distance output by the USBL filter is less than the long-to-medium-range switching threshold for N1 consecutive frames. ; The condition for switching from medium-range mode to short-range mode is: the sonar calculated range is less than the medium-to-short-range switching threshold. Furthermore, the target detection confidence level is higher than the safety threshold. ; The condition for reverting from close-range mode to mid-range mode is: the solution distance of the deep fusion network exceeds [a certain value]. Or the regression confidence level is below the minimum threshold within a continuous time window T. .

10. An acoustic-optical fusion guidance method for autonomous dynamic docking of underwater unmanned systems, implemented using the acoustic-optical fusion guidance system for autonomous dynamic docking of underwater unmanned systems as described in any one of claims 1-9, characterized in that: The aforementioned acoustic-optical fusion guidance method for autonomous dynamic docking of underwater unmanned systems specifically includes: S10, Long-range phase: The platform-side perception and guidance subsystem acquires the acoustic positioning data of the autonomous underwater vehicle through the USBL array, uses an adaptive robust Kalman filter algorithm to suppress multipath noise, outputs smooth autonomous underwater vehicle position coordinates, and guides the autonomous underwater vehicle to approach the underwater platform. S20, mid-range phase: Multibeam forward-looking sonar or sparse array imaging sonar is activated, and YOLO series lightweight network is used to identify the target autonomous underwater vehicle from two-dimensional acoustic images. Combined with Transformer tracking network, the two-dimensional coordinates of the autonomous underwater vehicle are output to guide the autonomous underwater vehicle to correct its horizontal heading. S30, in the close-range phase, simultaneously activates optical cameras and sonar, employs a dual-branch feature fusion network based on spatial cross-attention, achieves implicit alignment and complementary enhancement of acoustic and optical data at the feature layer, and is supplemented by an underwater electromagnetic positioning device to calculate the relative position of the autonomous underwater vehicle and control the docking to complete the docking. S40, throughout the entire guidance process, uses an intelligent switching control strategy based on a finite state machine to achieve smooth transitions between each stage, and triggers a rollback mechanism when an anomaly is detected.