A robust collision avoidance decision-making method for unmanned surface vehicle considering ship load change

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing dynamic context sequences and GRU implicit feature extraction, combined with the SAC decision framework and residual correction mechanism, the collision avoidance strategy of unmanned vessels is adaptively optimized, which solves the problem of time-varying dynamic parameters caused by changes in vessel load and improves the adaptability and robustness of the collision avoidance strategy.

CN122195014APending Publication Date: 2026-06-12JIMEI UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: JIMEI UNIV
Filing Date: 2026-05-15
Publication Date: 2026-06-12

Application Information

Patent Timeline

15 May 2026

Application

12 Jun 2026

Publication

CN122195014A

IPC: G05D1/43; G05D1/622; G05D109/30

AI Tagging

Application Domain

Vehicle position/course/altitude control Position/direction control

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122195014A_ABST

Patent Text Reader

Abstract

The present application relates to a kind of unmanned ship robust collision avoidance decision method considering ship load variation, belong to unmanned ship collision avoidance field.The method, constructs the collision avoidance strategy state space including the state of the ship, target ship state and target point state;Extract the dynamic context sequence formed by historical control action and the ship motion response characteristics, obtain implicit dynamic characteristics using gated recurrent unit (GRU) coding, and generate adaptive context features through residual correction mechanism;Combine SAC framework and mask attention mechanism, fuse multi-target ship interaction information and dynamic characteristics to output collision avoidance action, and map as rudder angle instruction execution.The present application can effectively adapt to the change of dynamics characteristics caused by load variation, improve the collision avoidance success rate and decision robustness of unmanned ship in multi-target ship encounter and load dynamic change scene, applicable to intelligent shipping autonomous navigation scene.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of unmanned surface vessel (USV) collision avoidance, specifically relating to a robust collision avoidance decision-making method for USVs that takes into account changes in ship load. Background Technology

[0002] With the widespread application of unmanned surface vehicles (USVs) in intelligent shipping, their autonomous navigation safety in complex marine environments has attracted considerable attention. During USV transportation, loading and unloading operations often cause dynamic changes in the vessel's load. These load changes further alter the vessel's mass distribution, inertial characteristics, hydrodynamic parameters, and maneuvering response characteristics, causing the USV's motion model to exhibit significant time-varying and uncertainties. This places higher demands on the accuracy, stability, and robustness of its collision avoidance decisions.

[0003] In recent years, autonomous collision avoidance methods for unmanned surface vessels (USVs) based on deep reinforcement learning (DRL) have gradually become a research hotspot. These methods, through continuous interaction between the agent and the environment, utilize reward signals to guide iterative policy optimization, achieving an end-to-end mapping from environmental perception to control decision-making. They possess strong autonomous learning capabilities and adaptability to complex scenarios. Compared to traditional rule-based, model predictive control, or artificial potential field-based methods, DRL methods exhibit better flexibility and decision-making performance in complex traffic environments. However, most existing studies assume that the ship dynamics model parameters are fixed and typically conduct training and testing only under single load conditions or idealized scenarios, lacking sufficient consideration of dynamic changes in ship loads.

[0004] In actual navigation, increases or decreases in vessel load can affect the maneuverability and dynamic response characteristics of unmanned surface vessels (USVs). For example, load changes may lead to lag in steering performance, decreased braking performance, altered heading stability, and increased trajectory tracking errors. When collision avoidance decision models are designed based on fixed parameter assumptions, a mismatch between the strategy and the actual dynamic response can easily occur. This can cause collision avoidance strategies that are effective under training conditions to degrade in actual operation, potentially resulting in delayed collision avoidance actions, excessive trajectory deviations, or control command instability, threatening the navigation safety of USVs. Furthermore, in complex sea states and multi-target vessel interaction scenarios, load changes and external disturbances often overlap, further increasing system uncertainty and making it difficult to guarantee navigation safety using traditional methods.

[0005] To improve the engineering applicability of collision avoidance methods for unmanned surface vessels (USVs) under complex conditions, some research has begun to focus on collision avoidance decision-making under marine environmental disturbances. However, most methods still primarily model and compensate for external environmental disturbances such as wind, waves, and currents, while insufficiently considering the time-varying internal dynamics caused by the dynamic changes in the vessel's own load. Especially within the deep reinforcement learning framework, without an effective representation and adaptation mechanism for the dynamic differences under different load conditions, the learned strategies typically only perform well under specific conditions and struggle to maintain stable collision avoidance performance across load conditions. Therefore, how to construct a robust collision avoidance decision-making method for USVs that can fully consider the dynamic characteristics of vessel load changes and still maintain strong safety under various typical load change scenarios has become a key technical problem that urgently needs to be solved in current research on autonomous navigation of USVs. Summary of the Invention

[0006] The purpose of this invention is to address the shortcomings of existing solutions in unmanned surface vessel (USV) collision avoidance, which fail to adequately consider the dynamic changes in ship loads. These shortcomings include time-varying ship dynamic parameters, mismatch between the decision model and actual navigation conditions, and insufficient adaptability, stability, and robustness of collision avoidance strategies under complex traffic environments and external disturbances. This invention provides a robust collision avoidance decision-making method for USVs that considers changes in ship loads. This method extracts control actions and ship motion response information from recent moments to construct a dynamic context sequence reflecting the impact of load changes. It then uses a GRU to perform implicit feature extraction on this dynamic context sequence, obtaining context variables characterizing the current ship dynamics. Simultaneously, by combining a SAC-based USV collision avoidance decision-making framework and a residual correction mechanism, the method adaptively optimizes collision avoidance strategies under different load conditions. This allows the strategy to adjust its collision avoidance behavior in real time according to changes in dynamic characteristics caused by load variations, thereby improving the collision avoidance success rate, decision stability, and overall robustness of USVs in scenarios with dynamic load changes, ensuring their safe and reliable navigation.

[0007] To achieve the above objectives, the technical solution of the present invention is: a robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, comprising:

[0008] Construct a collision avoidance strategy state space, which includes the state vector of the ship itself, the state set of the target ship, and the state vector of the target point.

[0009] A dynamic context sequence is constructed, which is composed of the single-step dynamic context vectors of the most recent K moments arranged in chronological order. Each single-step dynamic context vector includes the current steering control action, the ship's pitch speed, sway speed, bow roll rate and corresponding state increment.

[0010] The dynamic context sequence is temporally encoded using a gated recurrent unit to obtain an implicit context feature vector;

[0011] The implicit context feature vector is subjected to residual correction to obtain adaptive context features;

[0012] The collision avoidance strategy state space and the adaptive context features are input into the Actor policy network to output the collision avoidance control action.

[0013] The collision avoidance control action is mapped into a rudder angle control command and output to the servo actuator.

[0014] The present invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the above-described robust collision avoidance decision-making method for unmanned vessels that considers dynamic changes in ship load.

[0015] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described robust collision avoidance decision-making method for unmanned vessels that considers dynamic changes in ship loads.

[0016] Compared with existing technologies, this invention has the following advantages: The collision avoidance decision-making method for unmanned surface vessels (USVs) that considers dynamic changes in ship load proposed in this invention addresses the problem that existing USV collision avoidance methods are based on fixed load conditions and have difficulty adapting to changes in ship dynamics caused by load variations. By constructing a dynamic context sequence that combines historical control inputs with the ship's motion response, and using GRU to implicitly encode its features, context variables representing the current ship dynamic characteristics are extracted. At the same time, a residual correction mechanism is introduced to adaptively compensate for the implicit context features. Combined with a SAC-based decision framework and a masked attention mechanism, the USV collision avoidance strategy is adapted to different load conditions, thereby effectively improving the safety and robustness of USVs in scenarios with dynamic load changes. Attached Figure Description

[0017] Figure 1 This is a schematic diagram of the collision avoidance decision-making process for the unmanned vessel of the present invention.

[0018] Figure 2 This is a schematic diagram illustrating the training process of the method of the present invention. Detailed Implementation

[0019] The technical solution of the present invention will now be described in detail with reference to the accompanying drawings.

[0020] This invention presents a robust collision avoidance decision-making method for unmanned surface vessels (USVs) considering dynamic changes in ship loads. Its core lies in constructing a dynamic context sequence reflecting the impact of load changes within a deep reinforcement learning framework. This is achieved by using a Generative Dynamical Array (GRU) to extract temporal features from historical control actions and ship motion response information, obtaining implicit context variables characterizing the current ship dynamics. Simultaneously, by combining a SAC-based Actor-Critic collision avoidance decision-making network and a residual correction mechanism, adaptive optimization of the collision avoidance strategy under different load conditions is achieved. This enables highly safe and robust USV collision avoidance decision-making under dynamically changing load conditions. The collision avoidance process of this method is as follows: Figure 1 As shown, the main steps include:

[0021] Step 1: Constructing the state space for the collision avoidance strategy;

[0022] Step 2: Dynamic context sequence construction;

[0023] Step 3: Implicit contextual feature encoding;

[0024] Step 4: Residual Correction;

[0025] Step 5: Collision avoidance maneuver calculation;

[0026] Step 6: Control command output and execution.

[0027] For step 1, after obtaining the ship's own state information, the state information of surrounding target ships, and the target point information, the system performs feature calculations and normalization on various types of information to construct the strategy state space required for unmanned vessel collision avoidance decision-making. The strategy state space consists of three parts: the ship's own state, the set of target ship states, and the target point state, and its expression is:

[0028]

[0029] in, This indicates the overall strategy state at the current moment; This represents the ship's state vector; Represents the set of states of the target ship; This represents the target point's state vector. The specific calculation process is as follows:

[0030] Step 1.1: Calculate the ship's state vector.

[0031] Based on the ship's current motion state, obtain the pitch speed. sway speed Angular velocity of the bow And respectively using the corresponding maximum reference value , and After normalization, the ship's state vector is obtained:

[0032]

[0033] Step 1.2: Calculate the target point state vector.

[0034] Calculate the relative bearing of the target point to the ship based on the relationship between the target point and the ship's current position. and the relative heading angle of the target point with respect to the ship's current heading Thus, the target point state vector is constructed.

[0035]

[0036] Step 1.3: Calculate the state vector of a single target ship.

[0037] For the Calculate the relative distance between the target ship and the ship. Relative azimuth Relative heading angle Collision risk index Encounter Types and the target ship's speed Thus, a single target ship state vector is constructed.

[0038]

[0039] The formula for calculating the collision risk index is as follows:

[0040]

[0041] in This is the weight vector for each factor, reflecting its relative contribution to collision risk. Let be the fuzzy membership function for each risk factor, with a value range of . , , , , , These represent the weights of nearest encounter distance (DCPA), nearest encounter time (TCPA), relative bearing, relative distance, and relative heading in the collision risk, respectively. , , , and These represent the risk membership degrees corresponding to the five factors mentioned above. This indicates transpose.

[0042] Step 1.4: Construct the target ship state set.

[0043] Calculate the collision risk index corresponding to each target ship, and sort the state vectors of each target ship in descending order of collision risk index to construct the target ship state set.

[0044]

[0045] in, This represents the pre-set maximum number of target ships. If the number of target ships encountered by this ship at the current moment is less than... If the number of target ships at the current time is greater than 1, then the missing part is filled with the zero vector; Then, they are sorted from highest to lowest according to their collision risk index, and the ones with the highest risk are selected. Construct a target ship state set for each target ship.

[0046] Step 1.5: Construct the overall policy state space.

[0047] The ship's state vector, the target ship's state set, and the target point's state vector are concatenated according to formula (1) to form the overall strategy state space at the current moment. The constructed strategy state space can simultaneously represent the ship's motion state, the encounter situation of multiple surrounding target ships, and the target point guidance information, providing an input basis for subsequent dynamic context sequence construction and collision avoidance action decision-making.

[0048] For step 2, after obtaining the ship's motion state and control action information at the current and adjacent times, the system extracts historical control inputs and ship motion response characteristics to construct a dynamic context sequence characterizing the ship's recent dynamic changes. This dynamic context sequence reflects the temporal correlation between control inputs and motion responses under dynamic load conditions, thus providing an input basis for subsequent implicit context coding. The specific construction process is as follows.

[0049] Step 2.1: Extract single-step dynamic context features.

[0050] For the At each decision-making moment, extract the steering control action at the current moment. And the ship's current pitch speed. sway speed and bow roll rate Simultaneously, the sway velocity increment is calculated by combining the ship's motion state at adjacent moments. sway speed increment and bow roll rate increment ,in

[0051]

[0052] The above features respectively characterize the changes in the ship's motion state under the current control input and its effect.

[0053] Step 2.2: Construct a single-step dynamic context vector.

[0054] The current control action, the ship's motion state, and the corresponding state increment are combined in a preset order to obtain the first... The single-step dynamic context vector at each time step:

[0055]

[0056] Step 2.3: Construct dynamic context sequences.

[0057] Recently The single-step dynamic context vectors at each moment are arranged in chronological order to form a dynamic context sequence:

[0058]

[0059] in, This indicates the pre-defined length of the historical sequence. This represents the dynamic context sequence input to the encoding network at the current moment. By introducing context information from multiple consecutive moments, the continuous evolution of the control input and the ship's motion response in the time dimension can be characterized.

[0060] Step 2.4: Unify sequence length processing.

[0061] When the number of valid historical context vectors is insufficient To ensure a fixed dimension for the dynamic context sequence, the beginning of the sequence is padded with the most recent valid context vector. If there is no valid historical context information at the current time step, the dynamic context sequence is initialized with a zero vector. This method ensures that the length of the dynamic context sequence input to the subsequent encoding network remains consistent at any given time step.

[0062] Step 2.5: Form subsequent encoded input.

[0063] The constructed dynamic context sequence As input to the subsequent implicit context coding module, it is used to extract implicit features that characterize the ship's dynamic characteristics under the influence of the current load dynamic changes, providing a basis for subsequent residual correction and collision avoidance action decisions.

[0064] For step 3, after obtaining the dynamic context sequence constructed in step 2, the system uses a gated recurrent unit (GRU) to encode the dynamic context sequence using temporal features to extract the temporal correlation information between historical control inputs and the ship's motion response, thereby obtaining implicit context features that can characterize the ship's dynamic characteristics under the influence of current load dynamic changes. These implicit context features are used to provide supplementary dynamic information for subsequent collision avoidance decision-making, thus enhancing the strategy's adaptability to time-varying dynamic characteristics under different load conditions.

[0065] Step 3.1: Determine the encoder input.

[0066] The dynamic context sequence constructed in step 2 This serves as input to the implicit context encoder. By introducing contextual features from multiple consecutive time points, it is possible to reflect the evolution between recent control inputs and the ship's dynamic response.

[0067] Step 3.2: Perform GRU timing coding.

[0068] dynamic context sequence The data is input into a GRU encoding network in chronological order to recursively model the temporal dependencies in the sequence. For the... The single-step dynamic context vector input at each time step The GRU calculates the current hidden state based on the current input and the hidden state from the previous time step. Its expression can be written as:

[0069]

[0070] in, This indicates the hidden state in the previous moment. This indicates the hidden state at the current moment. As the dynamic context sequence is input sequentially, the GRU can gradually aggregate historical information to extract temporal features that reflect the current trend of ship dynamics changes.

[0071] Step 3.3: Generate implicit context feature vectors.

[0072] After completing the temporal encoding of the entire dynamic context sequence, the hidden state at the last time step of the GRU is taken as the implicit context feature vector at the current time step, denoted as:

[0073]

[0074] in, This represents the implicit context features at the current moment. The implicit context feature vector is used to characterize the comprehensive temporal characteristics of the ship's dynamic response under recent control inputs, thereby indirectly reflecting the impact of dynamic load changes on the ship's motion characteristics.

[0075] Step 3.4: Formulate inputs for subsequent decisions.

[0076] The obtained implicit context feature vector As the dynamic context information at the current moment, it is used together with the collision avoidance strategy state space constructed in step 1 for subsequent residual correction and collision avoidance action decision calculation. By introducing the implicit context features, the collision avoidance decision process can not only rely on the current static state information, but also comprehensively consider the time-varying dynamic features contained in the recent motion response sequence, thereby improving the decision adaptability and robustness of the unmanned vessel in scenarios with dynamic load changes.

[0077] For step 4, after obtaining the dynamic context sequence constructed in step 2 and the implicit context features encoded in step 3, the system further adjusts the implicit context features through a residual correction mechanism to enhance their adaptability to the ship's dynamic characteristics under the current dynamic load changes. The residual correction mechanism fuses historical information from the dynamic context sequence with the implicit context features to generate a context correction value, and then performs residual superposition on the original implicit context features to obtain adaptive context features for subsequent collision avoidance decision-making. The specific steps are as follows:

[0078] Step 4.1: Flatten the dynamic context sequence.

[0079] The dynamic context sequence obtained in step 2 Expanded in chronological order, it can be represented as a one-dimensional vector, denoted as...

[0080]

[0081] in, This operation will convert history into reality. The dynamic context features at each time step are uniformly represented as fixed-dimensional vectors so that they can be jointly computed with implicit context features.

[0082] Step 4.2: Construct the residual correction network input.

[0083] The implicit context feature vector obtained in step 3 With the flattened dynamic context vector The data is concatenated to form the input of the residual correction network, which is represented as follows:

[0084]

[0085] Step 4.3: Calculate the context correction amount.

[0086] The input vector Input the residual correction network to calculate the original correction amount. Its expression is

[0087]

[0088] in, This represents a residual correction network. In this invention, the residual correction network is a multilayer perceptron structure used to learn the correction direction and magnitude of implicit contextual features from the joint input. To limit the range of the correction amount, the original correction amount is further subjected to hyperbolic tangent compression and multiplied by a correction coefficient. To obtain the final context correction amount

[0089]

[0090] in, This represents the preset correction factor, used to control the magnitude of residual correction.

[0091] Step 4.4: Generate adaptive contextual features.

[0092] The implicit context features obtained in step 3 Compared with the context correction amount calculated in step 4.3 By performing residual stacking, the corrected adaptive contextual features are obtained, represented as follows:

[0093]

[0094] in, This represents the context feature vector after residual correction. Through the above residual correction process, while preserving the original implicit context feature information, adaptive compensation can be performed based on the historical information contained in the recent dynamic context sequence, thereby improving the ability of subsequent collision avoidance decisions to represent dynamic load changes.

[0095] Step 4.5: Formulate inputs for subsequent action decisions.

[0096] The adaptive context features after residual correction As context input to the subsequent collision avoidance decision-making module, it assists the strategy network in generating more adaptive control commands based on the ship's dynamic characteristics under current load changes.

[0097] For step 5, after obtaining the collision avoidance strategy state space constructed in step 1 and the adaptive context features obtained in step 4 after residual correction, the current strategy state is input into the Actor strategy network. The Actor strategy network first separates the current ship state and the target ship state, and uses a masked attention mechanism to weight and encode the target ship state set to extract the target ship interaction features related to the current collision avoidance decision. Subsequently, the current ship state, the target ship interaction features, and the adaptive context features after residual correction are fused and input into a multilayer perceptron network to output the current action distribution parameters, and further calculate the collision avoidance control action.

[0098] Step 5.1: State splitting.

[0099] The current policy state space obtained in step 1 After inputting into the Actor network, it is split into state vectors of the ship and the target point according to the preset dimensions. and target ship status set .

[0100] Step 5.2: Target ship attention encoding.

[0101] Will and The common input mask attention module calculates the target ship interaction feature vector. , represented as

[0102]

[0103] in, This represents the masked attention encoding function. This represents the target ship validity mask, used to distinguish between the real target ship state and the zero-filled state. This represents the attention-weighted target ship interaction feature vector. This step highlights the impact of high-risk target ships on decision-making and suppresses the interference of ineffective or low-relevance targets on the decision outcome.

[0104] Step 5.3: Construct strategy fusion features.

[0105] The target ship interaction feature vector obtained in step 5.2 The condition characteristics of this ship and the adaptive context features obtained in step 4 after residual correction. By concatenating the features, we obtain the strategy fusion feature, which is represented as:

[0106]

[0107] The strategy fusion features simultaneously include the current motion state of the ship, interaction information with key target ships, and implicit dynamic context features related to dynamic changes in loads.

[0108] Step 5.4: Calculate action distribution parameters. Integrate policy features. Inputting a multilayer perceptron network, the output vectors are the mean vector and the log-standard deviation vector of the action distribution, represented as follows:

[0109]

[0110] in, Represents the action mean vector. This represents the standard deviation vector of the logarithms of the actions. To ensure numerical stability, [the following is omitted as it is not explicitly stated in the original text]. After applying the truncation constraint, the standard deviation vector is obtained.

[0111]

[0112] Step 5.5: Generate collision avoidance actions.

[0113] Based on the action distribution parameters obtained in step 5.4, a Gaussian action distribution is constructed and the original action sample at the current time is generated. , represented as

[0114]

[0115] Subsequently, a hyperbolic tangent mapping is performed on the original motion samples to obtain the final collision avoidance motion.

[0116]

[0117] in, This represents the collision avoidance control action output at the current moment. Through hyperbolic tangent mapping, the action output can be limited to a preset control range, thereby satisfying the control constraints of the unmanned surface vessel's actuators.

[0118] For step 6, after obtaining the collision avoidance action at the current moment in step 5, the system maps the action output into a target rudder angle command that can be executed by the unmanned vessel's servo motor, and sends it to the motion control module for execution, thereby driving the unmanned vessel to complete the corresponding steering collision avoidance operation.

[0119] Step 6.1: Perform rudder angle mapping calculation.

[0120] Based on the maximum left and maximum right full rudder angles allowed by the unmanned vessel's servo motor, the collision avoidance control action output at the current moment will be... Linear mapping to actual target rudder angle The target rudder angle can be expressed as...

[0121]

[0122] in, and These represent the minimum and maximum rudder angles, respectively.

[0123] Step 6.2: Output rudder angle control command.

[0124] The target rudder angle calculated in step 6.2 The commands are converted into control instructions that the servo control module can receive and sent to the servo actuator of the unmanned vessel. The rudder angle control commands are used to drive the rudder surface deflection, enabling the unmanned vessel to perform a steering operation according to the current collision avoidance decision.

[0125] Step 6.3: Complete the action execution.

[0126] The servo actuator operates according to the target rudder angle. By controlling the rotation of the rudder surfaces, the unmanned surface vessel (USV) generates a corresponding turning motion under the action of hydrodynamics, thereby completing the collision avoidance maneuver for the current moment. Subsequently, the system enters the next decision-making moment and continues to execute subsequent rolling collision avoidance decisions based on the new navigation status.

[0127] The following are specific implementation examples of the present invention.

[0128] To improve the collision avoidance safety, decision stability, and overall robustness of unmanned surface vessels (USVs) under dynamic load changes, this invention proposes a robust collision avoidance decision-making method for USVs based on the Soft Actor-Critic (SAC) algorithm. This method integrates GRU implicit context encoding, a residual correction mechanism, and a masked attention network. Addressing the difficulty of existing collision avoidance strategies in adapting to time-varying ship dynamic parameters and model mismatch caused by load changes under fixed load assumptions, this method constructs a dynamic context sequence consisting of historical control actions, the ship's pitch, sway, and yaw angular velocities, and their state increments. This sequence is then encoded using GRU to extract context features that characterize the current implicit dynamic state. Simultaneously, a residual correction network is introduced into the Actor branch to generate a more suitable Actor context representation for action decisions based on the Critic context. Furthermore, a masked attention mechanism is used to weighted aggregate the states of multiple target ships, thereby enhancing the strategy's adaptability in complex multi-ship encounter scenarios and under different load conditions. Through the above design, this invention can achieve intelligent collision avoidance decision-making for unmanned vessels that balances collision avoidance effectiveness, operational condition adaptability, and robustness under the combined effects of multi-target ship interaction and dynamic load changes. Figure 2 As shown, the specific training steps are as follows:

[0129] Step 1: Dynamic modeling and load variation scenario design;

[0130] Step 2: Design of the collision avoidance strategy state space and dynamic context sequence;

[0131] Step 3: Design the reward function;

[0132] Step 4: Design of an Actor-Critic network integrating GRU, residual correction, and masked attention;

[0133] Step 5: Loss function design;

[0134] Step 6: Design of the experience buffer pool and context sampling mechanism;

[0135] Step 7: Training process design.

[0136] For step 1, a three-degree-of-freedom dynamic environment for the unmanned vessel, considering dynamic load changes, is first constructed, and a multi-target ship encounter scenario is designed based on this environment. The environment utilizes the MMG (Maneuvering Modeling Group) three-degree-of-freedom ship maneuvering motion model to describe the ship's dynamic characteristics. At each moment, the environment updates the ship's state through numerical integration based on the control input and the dynamic parameters corresponding to the current load condition, thus forming a navigation simulation environment that considers the impact of load changes.

[0137] Step 1.1: Construct a three-degree-of-freedom dynamic model of the ship

[0138] In this embodiment, the ship is modeled using the MMG three-degree-of-freedom maneuvering motion model, and its state is updated using the fourth-order Runge-Kutta numerical integration method. Within each decision cycle, based on the propeller speed and rudder angle in the current control inputs, the dynamic model is invoked to numerically solve for the ship's state, thereby obtaining the ship's position, heading, and velocity at the next moment. The simulation step size is set to 1 s, and the simulation area is 200 m × 200 m, within which the ship and the target ship dynamically interact.

[0139] Step 1.2: Design load variation conditions

[0140] To simulate the changes in the ship's dynamic parameters under different load conditions, five typical load cases are preset in the environment: Shallow Even (SE), Middle Even (ME), Middle Aft-trim (MA), Middle Fore-trim (MF), and Deep Even (DE). A corresponding set of ship dynamic parameters is configured for each load case. At the beginning of each training round, one of the five load cases is randomly selected as the load condition for the ship in the current round. The ship's dynamic model is then instantiated based on the selected load case parameters.

[0141] Step 1.3: Initialize the ship's state

[0142] At the start of each round, the initial heading of the ship is randomly selected, and the initial coordinates of the ship are determined based on the center position of the simulation area. To reduce the impact of the initial speed setting on training stability, before the formal scene interaction begins, a steady-state velocity solution is performed on the ship's dynamics model under fixed propeller speed at full rotation and zero rudder angle to obtain the steady-state sway velocity under the current load condition. Subsequently, while retaining the preset initial position and heading, the steady-state sway velocity is used as the initial forward velocity, while the sway velocity and bow roll rate are initialized to zero.

[0143] Step 1.4: Set control input constraints

[0144] In this embodiment, the vessel employs a control method with a fixed propeller speed and variable rudder angle. The propeller speed is fixed at full revs, and the rudder angle control is obtained by mapping the output action of the decision network. The maximum rudder angle is set to ±35° in the environment, and the current rudder angle state is set as the collision avoidance decision variable for the vessel. The normalized action output by the strategy network is mapped into a target rudder angle command in the environment, and then used as the steering control input for dynamic integration calculation at the current moment. By fixing the propeller speed and constraining the rudder angle range, the physical executability of the control input can be ensured while highlighting the impact of differences in the vessel's dynamic response under different load conditions on the collision avoidance decision.

[0145] Step 1.5: Design the scenario of the encounter between the target point and multiple target ships.

[0146] At the start of each round, the target point position is set according to the ship's initial course, giving the ship a clear navigation mission direction. Simultaneously, the environment randomly generates a number of target ships; in random target ship mode, the number of target ships is sampled from 0 to 10 according to a preset probability distribution. For each target ship, a scenario construction method based on encounter type is used for initialization. First, one encounter relationship is randomly selected from encounters such as face-to-face, right-hand crossing, left-hand crossing, overtaking, and being overtaken. Then, combined with the target point direction, the nearest encounter time, and the target ship's relative speed range, the target ship's initial position, course, and speed are calculated, ensuring that it forms a representative encounter scenario with the ship during subsequent navigation. To prevent the scenario from being too dense, the environment also sets a minimum generation spacing constraint for target ships to avoid initial positions of different target ships being too close.

[0147] Step 1.6: Design the target ship motion update mechanism

[0148] The target ship is updated using a speed- and heading-maintaining motion method in the environment. For each target ship, its current speed and heading are kept constant at each time step, and its position is recursively updated based on the discrete time step.

[0149] Regarding step 2: the specific implementation method is basically the same as steps 1 and 2 in the invention content, that is, to construct the collision avoidance strategy state space based on the state of the ship itself, the state of the target ship and the target point information, and to construct a dynamic context sequence based on historical control actions and the motion response information of the ship itself.

[0150] Regarding step 3: To guide the unmanned surface vessel (USV) in learning a collision avoidance strategy that balances target point approach, collision avoidance, rule compliance, and control stability under dynamic load changes and multi-target encounters, this implementation constructs a comprehensive reward function consisting of a target guidance term, a safety constraint term, and a motion stability term. For any given moment... The total reward can be expressed as

[0151]

[0152] in, The reward is the distance to the target point. As a course reward for the destination point, For collision / reaching the finish line reward, For rewards related to the rules, As a reward for collision risk, The speed obstacle course reward is calculated by summing all the rewards after weight scaling and normalization.

[0153] Step 3.1: Design the target point distance reward.

[0154] To encourage the ship to continuously approach the target point, a distance reward is constructed based on the change in distance to the target point between the current time and the previous time. Let the ship be at time... The Euclidean distance to the target point is The distance corresponding to the previous moment was The speed of this ship at the previous moment was The simulation step size is The distance reward design is as follows

[0155]

[0156] This reward reflects the ship's propulsion effectiveness along the target point. The reward increases when the ship effectively approaches the target point, and decreases when the ship deviates from the target point or its propulsion efficiency decreases.

[0157] Step 3.2: Design of target point heading reward.

[0158] To encourage the ship to maintain a reasonable course toward the target point, a course bonus is constructed based on the deviation between the ship's current course and the target point's relative bearing. Let the ship's relative bearing angle with respect to the target point be denoted as . The heading reward design is as follows:

[0159]

[0160] Therefore, the closer the ship's course is to the target point, the closer the reward is to 0; the greater the deviation, the smaller the reward, thus prompting the ship to maintain its course towards the target point while performing collision avoidance maneuvers.

[0161] Step 3.3: Collision and finish line reward design.

[0162] To strengthen safe navigation constraints, this implementation method sets explicit rewards or penalties for three types of events: boundary crossing, collision, and reaching the target point. When the ship's position exceeds the boundary of the simulation area, a large negative reward is given; when the distance between the ship and any target ship is less than the collision threshold, a collision is considered to have occurred, and a large negative reward is given; when the distance between the ship and the target point is less than the target point reaching threshold, a large positive reward is given. Specifically, in this implementation method, boundary crossing and collision both correspond to...

[0163]

[0164] When the ship reaches the target point, the corresponding

[0165]

[0166] Step 3.4: Penalties for violating collision avoidance rules ( )design

[0167] To improve the model's compliance with international maritime collision avoidance rules, a penalty term is added for single-ship encounter scenarios. This term guides the unmanned surface vessel (USV) to take avoidance measures that comply with the collision avoidance rules. Its calculation is shown in the following formula:

[0168]

[0169] By introducing this reward, deviations from the rules can be suppressed in typical rule scenarios, thereby improving the rule compliance of the strategy in critical encounter situations.

[0170] Step 3.5: Collision risk reward design.

[0171] To mitigate high-risk encounters before a collision occurs, this implementation introduces a reward based on a collision risk index. For all target vessels within the ship's line of sight, the collision risk index between the target vessel and the target vessel is first calculated, and then the risk is assessed based on the corresponding Time of Closest Encounter (TCPA). Therefore, the risk penalty is defined as follows:

[0172]

[0173] Subsequently, the collision risk penalty for all target ships within visual range is averaged to obtain the time. Collision risk reward

[0174]

[0175] in, This indicates the number of target ships currently within visual range. Indicates the first One target ship.

[0176] Step 3.6: Speed Obstacle Reward ( )design

[0177] If the speed of the vessel (OS) falls within the velocity obstacle zone formed by the other vessel (TS), a collision is possible. To address this, this invention introduces a VO penalty term into the reward function. This term penalizes actions that could potentially lead to a collision during reinforcement learning training, thereby driving the policy to automatically avoid unsafe speed zones. Its calculation is shown in the following formula:

[0178]

[0179] in This represents the velocity vector of the ship. This indicates that there is no risk of predicted collision at the current speed selection, and the reward value is 0 in this case; This indicates that the ship's speed has fallen to the [number missing]th [unit missing]. The target ship is in a speed obstacle zone where a collision may occur in the future, therefore a negative reward is given.

[0180] Subsequently, the speed obstacle reward for all target ships within visual range is averaged to obtain the time. Speed obstacle reward

[0181]

[0182] in, This indicates the number of target ships currently within visual range. Indicates the first One target ship.

[0183] Step 3.7: Adaptive scaling and normalization combination of reward weights.

[0184] To balance the effects of different reward items under varying traffic density scenarios, this implementation adaptively adjusts the weights of distance and heading rewards based on the number of visible target ships. Let the base weights be... and The number of target ships within visual range is currently [number]. Then its effective weight is defined as

[0185]

[0186] The weights of the remaining reward items remain fixed, i.e.

[0187]

[0188] Then let And scale and normalize each original reward item according to the following formula:

[0189]

[0190] The final total reward is expressed as .

[0191] The base weights are set to .

[0192] For step 4: To enable the collision avoidance strategy to simultaneously utilize the temporal information in the ship's control-motion response sequence and the interactive information in multi-target ship encounter scenarios, an Actor-Critic network structure integrating GRU, residual correction, and masked attention mechanisms is constructed. This network mainly includes a GRU implicit context encoder, a residual correction branch, an Actor policy network, a dual-Q Critic value network, and a target Critic network.

[0193] The GRU implicit context encoder is used to extract temporal features from the dynamic context sequence to obtain implicit context features that characterize the current implicit dynamic state. The residual correction branch is used to generate adaptive context features that are more suitable for action decisions based on the implicit context features. The Actor policy network combines the ship's state, target ship attention features, and adaptive context features to generate the collision avoidance action at the current moment. The dual-Q Critic value network combines the ship's state, target ship attention features, current action, and implicit context features to complete state-action value estimation. Both the Actor and Critic networks incorporate a masked attention mechanism to weight and encode the target ship state set, thereby highlighting the impact of high-risk target ships on decision-making and value estimation, and suppressing the interference of zero-filled target ships on the network output.

[0194] In this implementation, the Critic and the target Critic use implicit context features obtained through GRU encoding for value assessment to maintain the stability of the value estimation process. The Actor, on the other hand, uses adaptive context features corrected for residuals to generate actions, thereby enhancing the policy's adaptability to time-varying dynamic characteristics under dynamically changing load conditions. Through the above network structure design, historical dynamic information, multi-target ship interaction information, and implicit features of the current load condition can be comprehensively utilized, thereby improving the collision avoidance decision-making ability and overall robustness of the unmanned surface vessel in scenarios with dynamically changing loads.

[0195] Regarding step 5: In this embodiment, in order to enable the constructed Actor-Critic network that integrates GRU, residual correction and mask attention to be trained stably, value network loss, policy network loss and temperature coefficient loss are designed based on the Soft Actor-Critic (SAC) framework.

[0196] Step 5.1: Design of the Critic loss function.

[0197] In this embodiment, the Critic network consists of two independent value networks. and This is a component designed to mitigate overestimation problems that may occur during the training of a single value network. For the current sample, the current Critic network is first used for value estimation. , Subsequently, based on the state at the next moment... and adaptive context features The next action is sampled by the Actor network. and its log probability Then, the target Q-value is calculated using the target Critic network. Thus, at time... The Bellman objective can be represented as

[0198]

[0199] in, As a discount factor, The entropy temperature coefficient and Let represent the target Critic network. Therefore, the loss functions of the two value networks are defined as follows:

[0200]

[0201] Ultimately, Critic's total loss was

[0202]

[0203] In the current implementation, the GRU encoder and the Critic network are jointly updated by this loss term. That is, during backpropagation, the Critic parameters and encoder parameters are optimized simultaneously, thereby improving the implicit context features obtained by the GRU encoder. It is more conducive to value estimation.

[0204] Step 5.2: Design of the Actor's basic loss function.

[0205] The optimization objective of the policy network is to maximize value estimation while maintaining sufficient policy entropy to enhance policy exploration capabilities. For the current sample, adaptive contextual features corrected for residuals are first utilized. Sampling action and calculating its log probability. Then, the action is input into the Critic network for value evaluation. Since the Critic network uses implicit contextual features in this implementation... To perform the evaluation, the Actor's basic loss function is defined as follows:

[0206]

[0207] The first term is the entropy regularization term, used to encourage the policy to maintain randomness; the second term is the value term, used to guide the policy to generate actions that can obtain higher long-term returns. Through this basic loss term, policy optimization under the standard SAC framework can be achieved.

[0208] Step 5.3: Design of residual correction related regularization terms.

[0209] To avoid residual correction branches affecting the implicit context To avoid over-offsetting, this implementation introduces a context alignment regularization term and a residual magnitude regularization term in addition to the Actor base loss. First, to constrain the adaptive context... However, it deviates too much Define the context alignment loss as

[0210]

[0211] Step 5.4: Design of the total loss function for the Actor.

[0212] After comprehensively considering the strategy optimization objective and residual correction constraints, the total loss function of the Actor in this implementation is defined as follows:

[0213]

[0214] in This is the context alignment coefficient.

[0215] Step 5.5: Design of temperature coefficient loss function.

[0216] To enable adaptive adjustment of the strategy entropy, this implementation method adjusts the entropy temperature coefficient. A separate learnable parameter is introduced and updated using a target entropy constraint. Let the log probability obtained from sampling under the current policy be... Then the empirical entropy of the current batch can be expressed as

[0217]

[0218] Let the target entropy be The temperature coefficient loss is defined as follows:

[0219]

[0220] Among them, target entropy Set to the opposite of the action dimension.

[0221] Step 5.6: Target Critic soft update design.

[0222] To improve training stability, the target Critic network parameters are soft-updated after each round of parameter updates. Let the current Critic parameters be... The target Critic parameter is The soft update coefficient is The update rule is as follows:

[0223]

[0224] The soft update factor is set to 5×10. −3 .

[0225] The soft update mechanism described above allows the target network parameters to smoothly track changes in the current Critic network parameters, thereby reducing the instability caused by target value oscillations during training.

[0226] For step 6: Based on the conventional reinforcement learning experience replay mechanism, an experience buffer pool structure is designed to simultaneously store state transition information and single-step dynamic context features. The experience buffer pool not only stores standard transition samples such as state, action, reward, and next-moment state, but also additionally stores single-step context features composed of the current action and the ship's dynamic response, as well as information used to identify the round and condition to which the sample belongs, thus providing a foundation for subsequent context sequence construction. In this embodiment, the buffer pool capacity is set to 2 × 102 5， The single-step context feature dimension is set to 7, and the corresponding context sequence length is set to... .

[0227] Step 6.1: Design of content storage in the experience buffer pool.

[0228] The experience buffer stores each interaction sample as follows: .in, Indicates the current state. Indicates the current action. Indicates an immediate reward. Indicates the end of the process. Indicates the state at the next moment. This indicates the operating condition number corresponding to the sample. This represents a single-step dynamic context feature.

[0229] Step 6.2: Single-step dynamic context feature construction and writing.

[0230] At each interaction time step, this implementation constructs a single-step dynamic context feature based on the current state, current action, and next-time state while writing the transition sample to the experience buffer. This single-step context feature is saved synchronously when the current sample is written to the experience buffer. In this way, the buffer can directly construct the subsequently required context features based on the state transition data without adding any additional output interfaces to the environment.

[0231] Step 6.3: Operating condition number marking mechanism.

[0232] To support the subsequent construction of context sequences based on operating conditions, the operating condition number is recorded when each empirical sample is written. Specifically, the main algorithm maps the current load condition label in the environment return information to a discrete operating condition number. By recording these operating condition numbers, dynamic context sequences can be organized using operating conditions as indexes in subsequent sampling phases.

[0233] Step 6.4: Context sampling mechanism based on operating condition category.

[0234] The training phase actually employs a context sampling method based on operating condition categories. Specifically, for a batch of samples, the operating condition number vector is first read. Then, number each operating condition that appears in that batch. Retrieve all identical single-step context feature sets from the experience buffer pool. Subsequently, for those belonging to the same working condition Each sample, from Random selection with replacement Each single-step context feature is combined into a context sequence according to the sampling order. The sampling batch number was 256.

[0235] Step 6.5: Online inference context caching mechanism.

[0236] In addition to the experience buffer pool during the training phase, this invention also maintains an additional buffer pool of length [length missing] during the online inference phase. A circular context cache is used to store single-step context features from the most recent timeframes. Each time an environment interaction is executed, a new single-step context feature is constructed based on the current state, action, and the state at the next timeframe, and written to this circular cache. When action reasoning is needed, the most recent context features are read from the cache in chronological order. These contextual features form the current context sequence input to the GRU encoder. When the effective history length in the cache is insufficient... When there is no valid history, the most recent valid context vector is used for padding; when there is no valid history, the zero vector is used for initialization.

[0237] For step 7: First, initialize the environment, experience buffer, GRU encoder, residual correction branch, Actor network, Critic network, target Critic network, and their corresponding optimizers. Then, the agent interacts with the environment, generating actions based on the current online context at each time step and performing environment updates to obtain new states, rewards, and termination flags. Subsequently, the current state transition samples and their corresponding single-step dynamic context features are written into the experience buffer. When the number of experience samples reaches the preset update condition, a batch of samples and their corresponding context sequences are randomly sampled from the experience buffer. First, implicit context features are generated using the GRU encoder, and then adaptive context features are generated through the residual correction branch. Based on this, the parameters of the Critic network and encoder are updated first, then the parameters of the Actor network and residual correction branch are updated, followed by the temperature coefficient, and a soft update is performed on the target Critic network. The above process is continuously iterated until the preset number of training steps or model convergence conditions are reached, thereby completing the training of the unmanned vessel collision avoidance strategy for dynamically changing load scenarios.

[0238] By introducing GRU implicit context encoding, residual correction mechanism, and masked attention network into the Soft Actor-Critic (SAC) framework, this invention significantly improves the collision avoidance performance, decision stability, and overall robustness of unmanned surface vessels (USVs) under dynamically changing load conditions. This method addresses the limitations of existing collision avoidance strategies, which are mostly based on fixed load assumptions and struggle to adapt to time-varying ship dynamic parameters and model mismatch. It constructs a dynamic context sequence combining historical control inputs and the ship's motion response, and utilizes GRU to extract implicit dynamic context features. This allows the network to indirectly perceive the impact of dynamic load changes on ship maneuvering characteristics, thereby enhancing its ability to represent time-varying dynamic characteristics under complex operating conditions.

[0239] Specifically, this invention uses a GRU context encoder to perform temporal modeling of recent steering control actions, the ship's pitch, sway, and yaw angular velocities, and their state increments, extracting implicit contextual features representing the current implicit dynamic state. Based on this, a residual correction network is further used to generate adaptive contextual features more suitable for action decisions, thereby improving the adaptability of strategy action generation to load changes while maintaining the stability of value estimation. Simultaneously, both the Actor and Critic networks incorporate masked attention mechanisms to weightedly encode the multi-target ship state set, highlighting the impact of high-risk target ships on collision avoidance decisions and value assessment, and effectively suppressing the interference of zero-filled target ships on the network output. This results in a network with stronger interactive perception capabilities and decision robustness when handling multi-target ship encounter scenarios.

[0240] In summary, this invention improves the adaptability of action decisions to current operating conditions through dynamic context features and residual correction mechanisms, and enhances the ability to extract key target features in multi-target ship encounter environments through a masked attention mechanism. This enables unmanned vessels to achieve highly safe and robust collision avoidance decisions under conditions of dynamic load changes and complex multi-target ship interactions. This method has promising engineering application prospects and can be widely applied to various autonomous navigation scenarios, including intelligent shipping and unmanned transportation.

[0241] The present invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the above-described robust collision avoidance decision-making method for unmanned vessels that considers dynamic changes in ship load.

[0242] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described robust collision avoidance decision-making method for unmanned vessels that considers dynamic changes in ship loads.

[0243] The above are preferred embodiments of the present invention. Any changes made to the technical solution of the present invention that do not exceed the scope of the technical solution of the present invention shall fall within the protection scope of the present invention.

Claims

1. A robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, characterized in that, include: Construct a collision avoidance strategy state space, which includes the state vector of the ship itself, the state set of the target ship, and the state vector of the target point. A dynamic context sequence is constructed, which is composed of the single-step dynamic context vectors of the most recent K moments arranged in chronological order. Each single-step dynamic context vector includes the current steering control action, the ship's pitch speed, sway speed, bow roll rate and corresponding state increment. The dynamic context sequence is temporally encoded using a gated recurrent unit to obtain an implicit context feature vector; The implicit context feature vector is subjected to residual correction to obtain adaptive context features; The collision avoidance strategy state space and the adaptive context features are input into the Actor policy network to output the collision avoidance control action. The collision avoidance control action is mapped into a rudder angle control command and output to the servo actuator.

2. The robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, as described in claim 1, is characterized in that... The ship's state vector is obtained by normalizing the ship's pitch velocity, sway velocity, and bow roll rate, respectively. The calculation formula is as follows: in, , , These are the ship's pitch speed, sway speed, and bow roll rate, respectively. , , This corresponds to the maximum reference value.

3. The robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, as described in claim 1, is characterized in that... The target point state vector is constructed using the target point's relative azimuth angle relative to the ship and its relative heading angle relative to the ship's current heading. The calculation formula is as follows: in, The relative bearing of the target point to the ship. The relative heading angle of the target point with respect to the ship's current heading.

4. The robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, as described in claim 1, is characterized in that... The state vector of a single target vessel includes the relative distance, relative azimuth, relative heading angle, collision risk index, encounter type, and target vessel speed between the current vessel and the corresponding target vessel. The formula for calculating the collision risk index is as follows: in, This is the weight vector for each risk factor. Let be the fuzzy membership vector of each risk factor. , , , , These represent the weights of the nearest encounter distance, the nearest encounter time, relative bearing, relative distance, and relative heading, respectively. , , , and These represent the risk membership of five factors: nearest encounter distance, nearest encounter time, relative bearing, relative distance, and relative heading, respectively, with T representing transpose.

5. A robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, as described in claim 1 or 4, characterized in that, The target ship state set consists of the top N target ship state vectors sorted in descending order of collision risk index, where N is the preset maximum number of target ships; if the current number of target ships is less than N, the missing part is filled with a zero vector; If the number exceeds N, take the first N. The uniform processing rule for the length of the dynamic context sequence is as follows: when the number of historical valid context vectors is less than K, the sequence is filled with the most recent valid context vector; when there is no valid historical context information, the dynamic context sequence is initialized with a zero vector.

6. The robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, as described in claim 1, is characterized in that... The residual correction includes the following steps: flattening the dynamic context sequence into a one-dimensional vector and concatenating it with the implicit context feature vector to obtain the input vector of the residual correction network; inputting the input vector of the residual correction network into the residual correction network to obtain the original correction amount, which is then compressed using hyperbolic tangent and scaled to obtain the context correction amount; and performing residual superposition of the implicit context feature vector and the context correction amount to obtain the adaptive context feature, with the relevant calculation formula being: in, For dynamic context sequences, This is the flattened dynamic context vector. These are implicit context feature vectors. The input vector for the residual correction network. This is the original correction amount. For residual correction networks, The correction factor is a scaling factor for the preset coefficient. For context correction, This is an adaptive context feature.

7. A robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, as described in claim 1, is characterized in that... The steps for the Actor policy network to output collision avoidance control actions include: splitting the collision avoidance policy state space into a local state vector, a target point state vector, and a target ship state set; weighting and encoding the target ship state set using a masked attention module to obtain a target ship interaction feature vector; concatenating the local state vector with the target point state vector, the target ship interaction feature vector, and adaptive context features to obtain a policy fusion feature; inputting the policy fusion feature into a multilayer perceptron to obtain action distribution parameters, generating Gaussian action samples, and then obtaining the collision avoidance control action through hyperbolic tangent mapping.

8. A robust collision avoidance decision-making method for unmanned surface vessels considering changes in ship load, as described in claim 1, is characterized in that... The rudder angle control command is obtained by linearly mapping the collision avoidance control action, and the calculation formula is as follows: in, For the target rudder angle, , These are the minimum and maximum values of the rudder angle, respectively.

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the robust collision avoidance decision-making method for unmanned vessels that takes into account changes in ship load, as described in any one of claims 1 to 8.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the robust collision avoidance decision-making method for unmanned vessels that takes into account changes in ship load, as described in any one of claims 1 to 8.