Ship berthing control method and device based on combination of presence-nothing model control game
By employing a combination of model-based and non-model-based control game theory, and utilizing deep reinforcement learning and game theory to generate ship berthing control commands, the accuracy and robustness issues of ship berthing under complex sea conditions are solved, achieving high-precision and high-robustness ship berthing control.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHIJIAZHUANG TIEDAO UNIV
- Filing Date
- 2026-04-22
- Publication Date
- 2026-06-30
AI Technical Summary
Existing ship berthing control methods struggle to achieve high-precision and robust smooth berthing under complex sea conditions. Single model-based control methods are highly dependent on model accuracy, while model-free control methods lack robustness in the face of dynamically changing marine environments.
A method based on model-less control game is adopted. Through deep reinforcement learning algorithm and game theory, the fusion weights of model-less and model-based control commands are generated and weighted to generate the final control command, which is then output to the ship's actuators.
It achieves high-precision and robust control for ship berthing under complex sea conditions, integrating the stability of model-based control with the adaptability of model-free control, thereby improving control adaptability and accuracy.
Smart Images

Figure CN122308378A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of ship control technology, and in particular to a ship berthing control method and apparatus based on a combination of presence-absence model control game. Background Technology
[0002] Ship berthing is a critical part of shipping operations. The marine environment is characterized by complex and ever-changing disturbances such as wind, waves, and currents, which places stringent requirements on the motion control of ships during berthing, demanding high precision and robustness. How to achieve smooth and precise berthing control of ships under complex sea conditions has become an important research direction in the field of ship control technology.
[0003] In the field of ship berthing control, two main technical solutions are currently used. One is the model-based control method, represented by the backstepping method, which designs control strategies based on the ship's dynamics model. The other is the model-free control method, represented by iterative learning control, which learns from historical operating data and iteratively optimizes the control effect. Both methods are applied separately in the motion control practice of ship berthing.
[0004] Model-based control methods are highly dependent on the accuracy of the system model. When the model parameters are uncertain or the ship is disturbed by unknown environments during navigation, the control performance will drop significantly and it will be difficult to adapt to complex sea conditions. Model-free control methods can improve the control effect by learning from historical data, but they have the problem of slow convergence speed. They are not robust enough in the face of dynamic and changing marine environments. No single control method can meet the control requirements of ship berthing under complex sea conditions. Summary of the Invention
[0005] This invention provides a ship berthing control method and apparatus based on a combination of presence-absence model control game to address the problem that existing single control methods cannot meet the safety, stability, and accuracy requirements of ship berthing under complex sea conditions.
[0006] In a first aspect, embodiments of the present invention provide a ship berthing control method based on a combination of presence-absence model control games, comprising: Obtain the ship's current motion state and desired reference trajectory; Based on the ship's current motion state and the desired reference trajectory, generate model-free control commands and model-based control commands; Based on deep reinforcement learning algorithms and game theory, the system outputs fusion weights for the model-free control instructions and the model-based control instructions. The model-less control command and the model-based control command are weighted and fused according to the fusion weight to obtain the final control command and output to the ship's actuators.
[0007] In one possible implementation, model-free control commands and model-based control commands are generated based on the ship's current motion state and the desired reference trajectory, including: Based on the ship's current motion state and the desired reference trajectory, a fractional-order iterative learning controller is used to generate the model-free control command. The update rate of the fractional-order iterative learning controller is set in fractional order. Based on the ship's current motion state and the desired reference trajectory, a backstepping controller is used to generate the model-based control command. The backstepping controller is designed based on the ship's nonlinear dynamics model.
[0008] In one possible implementation, the model-free control commands are generated using a fractional-order iterative learning controller based on the ship's current motion state and the desired reference trajectory, including: Based on the ship's current motion state and the desired reference trajectory, the control law is updated through iterative learning. Based on the current batch tracking error, the control input of the previous batch iterative learning, and the adjustment of the fractional operator matrix, a model-free control command is generated.
[0009] In one possible implementation, based on the ship's current motion state and the desired reference trajectory, a model-free control command is generated through iterative learning of the control law update, and based on the current batch tracking error, the previous batch iterative learning control input, and the adjustment of the fractional-order operator matrix. This command includes: according to Generate model-free control instructions; in, Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the sampling time or the number of samples. Indicates the first Batch No. Time in succession The update rate of time-based learning. , Let them represent fractional operator matrices, Indicates the first Batch No. The tracking error corresponding to the expected reference trajectory at the given time. Indicates the first Batch No. The tracking error corresponding to the motion state variables over time. Indicates the first Batch No. The expected reference trajectory corresponding to the time. Indicates the first Batch No. The actual output trajectory corresponding to the time. Indicates the first Batch No. The motion state value corresponding to time. Indicates the first Batch No. The motion state value corresponding to time.
[0010] In one possible implementation, the model-based control commands are generated using a backstepping controller based on the ship's current motion state and the desired reference trajectory, including: Based on the ship's current motion state and the desired reference trajectory, the virtual control law is gradually constructed, error variables are defined, and Lyapunov functions are designed by performing a global differential homeomorphic transformation on the ship's heading control system, thereby obtaining the model-based control command.
[0011] In one possible implementation, the state variables set in the global differential homeomorphic transformation include: ; in, , , These represent the first-order state variable, the second-order state variable, and the third-order state variable, respectively. Indicates the ship's heading angle. Indicates the bow roll rate of the ship. express The first derivative; After the global differential homeomorphic transformation, the nonlinear dynamic model of the ship's heading motion is transformed into... ; in, express The first derivative with respect to time, express The first derivative with respect to time, express The first derivative with respect to time, This represents the estimated value of the nonlinear term. This represents the actual value of the nonlinear term. This represents the control gain coefficient. This represents the input quantity to the backstepping controller. Indicates the nonlinear hydrodynamic coefficient of a ship. Represents the ship's maneuvering time constant. This represents the servo motor time constant.
[0012] In one possible implementation, the virtual control law is constructed step by step, error variables are defined, and Lyapunov functions are designed to obtain the model-based control command, including: Based on the transformed nonlinear dynamic model of the ship's heading motion, a first-level error variable is defined with the desired heading angle as the reference, a first-level virtual control law matching the first-level error variable is constructed, and a first-level Lyapunov function is designed. Based on the first-level virtual control law, a second-level error variable is defined, and a second-level Lyapunov function is designed; A second-level virtual control law matching the second-level error variable is constructed, a third-level error variable is defined, and the model-based control command is derived by combining the nonlinear dynamic model of the ship's heading motion.
[0013] In one possible implementation, based on deep reinforcement learning algorithms and game theory, the fusion weights for the model-free control instructions and the model-based control instructions are output, including: The fusion weights are solved by integrating incomplete information game theory and deep reinforcement learning algorithms, using the change in ship state error as the payoff function.
[0014] In one possible implementation, after receiving the final control command and outputting it to the ship's actuators, the following is also included: Based on the ship's new motion state after the final control command is executed, the learning strategy is corrected to achieve closed-loop optimized control for the ship's berthing.
[0015] Secondly, embodiments of the present invention provide a ship berthing control device based on a combination of presence-absence model control games, comprising: The acquisition module is used to acquire the ship's current motion state and the desired reference trajectory; The controller module is used to generate model-free control commands and model-based control commands based on the ship's current motion state and the desired reference trajectory. The decision module is used to output fusion weights for the model-free control command and the model-based control command based on deep reinforcement learning algorithms and game theory. The decision module is further configured to perform weighted fusion of the model-free control command and the model-based control command according to the fusion weight to obtain the final control command; The output module is used to output the final control command obtained by weighted fusion to the ship's actuators.
[0016] This invention provides a ship berthing control method and apparatus based on a combination of model-based and model-free control game theory. The method involves acquiring the ship's current motion state and desired reference trajectory; generating model-free and model-based control commands based on these parameters; outputting fusion weights for the model-free and model-based control commands based on deep reinforcement learning algorithms and game theory; and weighting and fusing the model-free and model-based control commands according to these weights to obtain the final control command, which is then output to the ship's actuators. This invention simultaneously generates both model-free and model-based control commands, combining the advantages of both control methods, while relying on model-based control to ensure system stability. It achieves basic stability and adaptive optimization through model-free control using fractional-order iterative learning, overcoming the performance deficiencies of single control methods in existing technologies. Based on deep reinforcement learning and game theory, it dynamically solves and integrates weights, abandoning fixed weight strategies and allowing weights to dynamically adjust with ship status and sea state, adapting to complex and ever-changing marine environments. It dynamically weights and integrates dual commands and outputs them for execution, giving the final control command both stability and self-learning tracking advantages, effectively improving the accuracy and robustness of ship berthing control under complex sea conditions. The entire process is based on the ship's real-time status and desired trajectory, with each step linked and connected, making the control system more aligned with the actual needs of ship berthing and improving overall control adaptability. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 This is a flowchart illustrating the implementation of the ship berthing control method based on a combination of presence and absence model control games provided in this embodiment of the invention. Figure 2 This is a schematic diagram of the structure of the ship berthing control device based on a combination of presence and absence model control games provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0019] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
[0020] See Figure 1 The diagram illustrates a flowchart of a ship berthing control method based on a combination of presence and absence model control games, as provided in an embodiment of the present invention. Details are as follows: Step 101: Obtain the ship's current motion state and desired reference trajectory.
[0021] This step is the foundational sensing stage for ship berthing control. It involves collecting real-time navigation data of the ship through onboard sensing equipment and determining the desired reference trajectory required for the ship's berthing operation, providing basic data support for the generation of the two types of control commands that follow.
[0022] To acquire the current motion state of the ship, firstly, all data during the berthing and navigation process of the model ship is collected through various onboard sensors and monitoring equipment. Based on the collected data, a ship navigation state dataset is established. Then, the core state parameters required for ship berthing control are extracted from the ship navigation state dataset, which may include the ship's position, heading angle, longitudinal speed, lateral speed, and bow roll rate. At the same time, relevant data such as ship speed and roll rate can be extracted to form a complete current motion state of the ship. In this step, the measurement error of the sensing equipment itself is ignored to ensure the validity of the state data.
[0023] In determining the expected reference trajectory, the expected reference trajectory is a standard navigation trajectory preset for ship berthing operations. It includes key reference information such as the expected position and expected heading angle at each stage of the ship's berthing process. The expected heading angle can be the reference angle set by LOS navigation or a known heading angle reference value preset according to the berthing operation requirements. The expected reference trajectory serves as the control target for ship berthing and provides a benchmark for error judgment and trajectory tracking in the subsequent control command generation process.
[0024] The ship's current motion state and desired reference trajectory obtained in this step will be synchronously transmitted to the subsequent iterative learning controller and backstepping controller, serving as the core input data for the two types of controllers to generate corresponding control commands.
[0025] Step 102: Generate model-less control commands and model-based control commands based on the ship's current motion state and the desired reference trajectory.
[0026] This step uses the ship's current motion state and desired reference trajectory obtained in step 101 as the core inputs. It generates model-free control commands and model-based control commands in parallel through a fractional-order iterative learning controller and a backstepping controller, respectively, to provide dual-path control inputs for subsequent weighted fusion of commands. The model-free control commands rely on historical data to learn and iterate to achieve trajectory tracking, while the model-based control commands are derived based on the ship's nonlinear dynamics model to ensure system stability. The two types of commands complement each other by taking advantage of their respective strengths.
[0027] In one embodiment, model-free control commands and model-based control commands are generated based on the ship's current motion state and the desired reference trajectory, including: Based on the ship's current motion state and the desired reference trajectory, a fractional-order iterative learning controller is used to generate model-free control commands. The update rate of the fractional-order iterative learning controller is set in fractional order. Based on the ship's current motion state and the desired reference trajectory, a backstepping controller is used to generate model control commands. The backstepping controller is designed based on the ship's nonlinear dynamics model.
[0028] Specifically, based on the ship's current motion state and the desired reference trajectory, a fractional-order iterative learning controller is used to generate model-free control commands. This can include: updating the control law through iterative learning based on the ship's current motion state and the desired reference trajectory; and generating model-free control commands based on the current batch tracking error, the previous batch iterative learning control input, and adjustments to the fractional-order operator matrix. The introduction of fractional-order parameters expands the parameter tuning range and improves control flexibility and system dynamic performance adjustment capabilities.
[0029] Optionally, first define the iteration-related parameters. For batch number, The sampling time or number of times is described in this embodiment as time. Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the first Batch No. Time in succession The update rate for time-based learning is set in fractional order, and this is introduced... , Optimize the update rate properties for fractional operator matrices.
[0030] Then, based on the iterative correlation parameters defined above, the tracking error and state variable error are calculated. Batch No. The tracking error corresponding to the expected reference trajectory at the given time is , Indicates the first Batch No. The expected reference trajectory corresponding to the time. Indicates the first Batch No. The actual output trajectory corresponding to the time; for system state variables, Indicates the first Batch No. The tracking error corresponding to the time motion state variables. Indicates the first Batch No. The motion state value corresponding to time. Indicates the first Batch No. The motion state value corresponding to time; where, the motion state value , Indicates the ship's speed. Indicates the roll rate. Indicates the ship's heading angle.
[0031] Finally, the instruction update is completed based on the iterative learning control law. Combining the tracking error, the previous batch of control inputs, and the fractional update rate, the control law formula is used to update the instructions. Iterative calculation yields the first... Batch No. The model-free control instructions generated through time-iterative learning are then completed. Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the sampling time or the number of samples. Indicates the first Batch No. Time in succession The update rate of time-based learning. , Let represent fractional operator matrices, respectively.
[0032] Fractional-order parameters allow for a wider tuning range, more flexible control, and better adjustment of the dynamic performance of the ship system, achieving optimal control results and ultimately yielding model-free control input commands for iterative learning control. .
[0033] The backstepping controller is designed based on the nonlinear dynamic model of the ship's heading motion. With minimizing the ship's heading error as the core benchmark, it derives the model-based control command with theoretical stability by performing a global differential homeomorphic transformation on the ship's heading control system, then gradually constructing a virtual control law, defining error variables, and designing a step-by-step recursive method for the Lyapunov function.
[0034] In one embodiment, generating model-based control commands using a backstepping controller based on the ship's current motion state and desired reference trajectory may include: gradually constructing a virtual control law, defining error variables, and designing a Lyapunov function by performing a global differential homeomorphic transformation on the ship's heading control system based on the ship's current motion state and desired reference trajectory, thereby obtaining model-based control commands.
[0035] Optionally, a global differential homeomorphic transformation is performed on the ship's heading control system, and state variables are set. The state variables set in the global differential homeomorphic transformation include: ; in, , , These represent the first-order state variable, the second-order state variable, and the third-order state variable, respectively. Indicates the ship's heading angle. Indicates the bow roll rate of the ship. express The first derivative; After a global differential homeomorphic transformation, the nonlinear dynamic model of the ship's heading motion is transformed into a regular series form, that is, ; in, express The first derivative with respect to time represents the rate of change of the heading angle, i.e., the ship's bow roll rate, corresponding to the second-order state variable. ; express The first derivative with respect to time represents the rate of change of the bow angular velocity, i.e., the ship's bow angular acceleration, corresponding to the third-order state variable. ; express The first derivative with respect to time represents the rate of change of the rudder angle, that is, the angular velocity of the rudder motor. It represents the estimated value of the nonlinear term, the online estimate of the unknown / unmodeled nonlinear dynamic term in the ship's heading system, which is used to compensate for model uncertainty in the backstepping controller and ensure control robustness; Represents the actual value of nonlinear terms, including known nonlinear dynamic terms in a ship's heading system, such as hydrodynamic damping and moment of inertia. Represents the control gain coefficient, characterizing the control input. Rate of change of rudder angle The intensity of the impact is related to the performance of the steering gear and the hydrodynamic characteristics of the ship. This represents the input to the backstepping controller. The output of the backstepping controller includes model control commands, such as servo control signals, used to drive the rudder blades to rotate. It represents the nonlinear hydrodynamic coefficient of a ship, characterizes the nonlinear damping characteristics of the ship's heading system, and is related to the hull shape and speed. It represents the ship's maneuvering time constant, the inherent time constant of the ship's heading system, and reflects how quickly the ship responds to rudder angles; This represents the servo time constant, the time constant of the servo actuator, which reflects the servo's response speed.
[0036] In one embodiment, the virtual control law is constructed step by step, error variables are defined, and Lyapunov functions are designed to obtain model control instructions, which may include: Based on the transformed nonlinear dynamic model of ship heading motion, a first-level error variable is defined with the desired heading angle as the reference. A first-level virtual control law matching the first-level error variable is constructed, and a first-level Lyapunov function is designed. The first-level Lyapunov function is used to verify the stability of the first-level subsystem. Based on the first-level virtual control law, a second-level error variable is defined, and a second-level Lyapunov function is designed so that the stability of the second-level subsystem can be verified on the basis of the stability of the first-level subsystem. Based on the second-level virtual control law, a third-level error variable is defined, and the model-based control command is derived by combining the nonlinear dynamic model of the ship's heading motion.
[0037] Optionally, will As a subsystem Control variables, define the first-level error variables. , ;in, , These represent the first-level error variables, This indicates the ship's desired heading angle, which can be a reference angle set by LOS navigation or a known heading angle reference value. express The first derivative.
[0038] The first-level virtual control law constructed for matching is: ,in, This represents the positive feedback gain coefficient of the first-stage subsystem.
[0039] The solution can be obtained using the first-level error variable and the first-level virtual control law. ,in, express The first derivative. Define the first-order Lyapunov function as... .
[0040] Will As a subsystem Define the control variables and the second-level error variables. ,in, express The second derivative of can be solved to obtain ,in, Indicates to The first derivative. Define the second-order Lyapunov function. The second-level virtual control law is set as follows: ,in, The positive feedback gain coefficient of the second-stage subsystem can be obtained. , for Based on the first derivative of the equation and the above information, the error variables and Lyapunov functions are designed step-by-step using the backstepping method. By deriving the virtual control law and the actual control law that stabilize the system state, the model control command is obtained. .
[0041] The current actual state of the ship is obtained from the shipborne sensing equipment. At this point, ignoring the measurement errors of the shipborne sensing equipment itself, the data is transmitted to the backstepping controller and combined with the reference target state quantity. The model controller control commands are obtained. Based on the control law principle of a fractional-order iterative learning controller, combined with... Batch status variables With the set multiple constraints, the control commands of the modelless controller can be obtained by solving. .
[0042] The model-free and model-based control instructions generated in this step will be synchronously transmitted to the subsequent decision-making module, providing a basis for the calculation of fusion weights and the weighted fusion of instructions.
[0043] Step 103: Based on deep reinforcement learning algorithms and game theory, output the fusion weights for model-less control instructions and model-based control instructions.
[0044] This step takes the model-free control instructions and model-based control instructions generated in step 102 as inputs to construct a decision-making module that integrates deep reinforcement learning and game theory. Through closed-loop optimization of game decision-making and reinforcement learning, the optimal fusion weights of the two types of control instructions are dynamically solved, realizing the adaptive complementarity of the stability advantage of model-based control and the self-learning advantage of model-free control, and providing a decision basis for subsequent weighted fusion of instructions.
[0045] In one embodiment, based on deep reinforcement learning algorithms and game theory, fusion weights for model-less control instructions and model-based control instructions are output, including: By integrating incomplete information game theory and deep reinforcement learning algorithms, the fusion weights are solved using the change in ship state error as the payoff function.
[0046] Optionally, this embodiment integrates incomplete information game theory and deep reinforcement learning algorithms, abstracting model-less control commands and model-based control commands into two independent game subjects. The change in state error during the ship's berthing process is used as the payoff function, and the optimal fusion weight of the two types of commands is solved through game interaction. At the same time, a closed-loop learning mechanism is constructed with deep reinforcement learning algorithms as the core. The decision-making strategy is continuously corrected through real-time feedback of ship motion, realizing online adaptive adjustment of fusion weights to adapt to the dynamic control requirements of ship berthing under complex sea conditions.
[0047] In the game-theoretic decision-making phase, model-free control commands and model-based control commands are treated as two separate players. The control objective of ship berthing is used as the game constraint, and the change in ship state error is used as the payoff function to construct an incomplete information game model. The model primarily includes three control objectives: position, course, and speed. Ship state error comprises core parameters such as the position deviation between the actual ship position and the desired reference trajectory, the course deviation between the actual and desired course angles, and the speed deviation between the actual and desired speeds. The payoff function aims to minimize the state error. Through game equilibrium solving, the initial fusion weights of the model-free and model-based control commands are obtained, ensuring the optimal synergistic effect of the two types of commands under the current sea state.
[0048] In the deep reinforcement learning closed-loop optimization stage, an online learning mechanism for decision-making is constructed with deep reinforcement learning algorithms at its core. The state variable deviation corresponding to the ship's new motion state is used as a state variable element, the fusion weights as output action elements, the input-output feedback process of the ship's motion as an environmental element, and the change in the ship's state error as a reward element. This continuously updates and optimizes the decision-making module. Optionally, during the ship's berthing process, the new motion state of the ship after executing control commands is collected in real time, and the state variable deviation and state error change are calculated. These are then fed back to the deep reinforcement learning network as reward signals. Through backpropagation iteration of the network, the decision-making strategy of the fusion weights is corrected, achieving dynamic adjustment of the fusion weights. This allows the decision-making module to adapt to the dynamic changes in complex sea conditions and continuously optimize the synergistic effect of the two types of control commands.
[0049] The fusion weights output in this step will be synchronously transmitted to subsequent steps for weighted fusion of model-free control commands and model-based control commands, ultimately generating final control commands that meet the ship's berthing control requirements.
[0050] Step 104: Based on the fusion weight, the modelless control command and the model-based control command are weighted and fused to obtain the final control command and output to the ship's actuators.
[0051] This step takes the model-free control command and model-based control command generated in step 102, as well as the fusion weight of the two types of commands output in step 103, as the core input. The final control command is generated through weighted fusion calculation and output to the ship's actuators to complete the command execution link of ship berthing control. This realizes the synergistic implementation of the stability advantage of model-based control and the self-learning advantage of model-free control.
[0052] The weighted fusion in this step is based on the fusion weights obtained in step 103. Corresponding weight coefficients are assigned to model-free control commands and model-based control commands respectively. The fusion of the two types of commands is completed through linear weighted calculation, so that the final control command takes into account the global stability and resistance to model parameter uncertainty of model-based control, as well as the trajectory tracking accuracy and historical data iterative optimization capability of model-free control. This solves the performance defects of single control methods under complex sea conditions and adapts to the high-precision and high-robust control requirements of ship berthing operations.
[0053] Let the fusion weights output in step 103 be... Let the model-free control command generated in step 102 be... There are model control commands for The final control command is then calculated using a weighted fusion formula. The formula is: .
[0054] In the formula, the fusion weights are dynamically solved by the deep reinforcement learning and game theory decision-making module in step 103. They can be adjusted in real time according to the changes in sea state and state errors during the ship's berthing process: when the sea state is complex and the model parameters are highly uncertain, the weights of the model control commands are automatically increased to enhance system stability; when the trajectory tracking accuracy requirements are high and historical operation data is sufficient, the weights of the model-free control commands are automatically increased to optimize the tracking effect and achieve adaptive synergy between the two control methods.
[0055] After the weighted fusion calculation is completed, the generated final control command is output to the ship's steering gear, propeller and other actuators through the ship actuator interface module. The actuators then perform corresponding rudder angle adjustment, speed adjustment and other operations according to the final control command, driving the ship to complete the berthing operation along the desired reference trajectory.
[0056] The final control command generated in this step is the core output that complements the advantages of the two control methods. It provides a stable and accurate execution basis for the closed-loop control of ship berthing, ensuring that the ship berths smoothly and accurately in complex sea conditions.
[0057] In one embodiment, after receiving the final control command and outputting it to the ship's actuators, the method further includes: Based on the ship's new motion state after the final control command is executed, the learning strategy is modified to achieve closed-loop optimized control for ship berthing.
[0058] This step serves as a closed-loop optimization stage in ship berthing control, culminating in the ship executing the final control command. The new motion state is used as the core feedback basis to make real-time corrections to the decision learning strategy constructed by deep reinforcement learning and game theory. At the same time, the learning memory of the learning controller is updated and iterated to realize the dynamic optimization and self-learning upgrade of the entire ship berthing control system, ensuring the continuous improvement of the accuracy and robustness of ship berthing control under complex sea conditions.
[0059] After receiving and executing the final control command, the ship's actuators collect the ship's new motion state quantities in real time through the ship's onboard sensing equipment. The new motion state quantities are consistent with the dimensions of the ship's current motion state quantities obtained in step 101, including the ship's new position, heading angle, longitudinal speed, lateral speed, and yaw rate, among other core parameters. At the same time, the latest state error between the new motion state quantities and the expected reference trajectory is calculated to provide data support for the correction of the learning strategy.
[0060] Based on the collected new motion state variables of the ship and the latest state error, a comprehensive correction of the learning strategy is carried out from the perspective of the adaptability of each module of the control system. In the learning strategy correction of the decision-making part, the state variable deviation corresponding to the new motion state of the ship is used as the state variable element of deep reinforcement learning, the fusion weight output in step 103 is used as the output action element, the entire input-output feedback process of the ship from receiving the final control command, executing the action, to feedback of the new motion state is used as the environmental element, and the change of the ship's motion state error fed back by the system is used as the reward element to complete the update of the decision state and reward signals. The reward function takes the positive reduction of the ship's state error as the optimization objective. When the state error shows a decreasing trend and the rate of reduction increases, the reward function outputs a large positive value, and vice versa. During decision-making, based on the new state and reward signals, the game strategy and weight solution logic are corrected through backpropagation and parameter iteration of the deep reinforcement learning network, so that the dynamic allocation of the fusion weights is more adapted to the current sea state and ship motion state, and the positive feedback of the controller combination effect is achieved.
[0061] When updating the learning memory of the iterative learning controller, data such as the ship's new motion state variables, the latest state error, the currently executed model-free control commands, the final control commands, and the corresponding fusion weights are synchronously stored in the iterative learning controller's learning memory as historical data support for the next batch of iterative learning. Based on the updated learning memory, the fractional-order iterative learning controller will optimize the fractional-order operator matrix and the iterative learning update rate, improving the generation accuracy and convergence speed of subsequent model-free control commands, allowing the self-learning capability of model-free control to continuously enhance during the ship's berthing control process.
[0062] After completing the learning strategy correction and learning memory update, the corrected learning strategy and updated historical data are used as inputs for the next round of control calculations. The entire process of control, from the acquisition of ship motion state variables to the generation and execution of final control commands, is restarted, forming a complete closed-loop optimization control system.
[0063] Through this closed-loop optimization process, this embodiment can continuously measure the adaptability of old control experience and new control strategies, continuously explore and improve the combination rules of model-free control and model-based control, realize the adaptive compensation defects of the two types of controllers in dynamic sea conditions, enable the ship berthing control system to have the ability to learn and optimize itself, effectively solve the problem of accuracy reduction caused by dynamic changes in the control environment under complex sea conditions, continuously improve the control accuracy of ship berthing, maintain the stability of control effect, and ultimately achieve high-precision and high-robustness berthing control of ships under complex sea conditions.
[0064] This invention provides a ship berthing control method based on a combination of model-based and model-free control game theory. The method involves acquiring the ship's current motion state and desired reference trajectory; generating model-free and model-based control commands based on these parameters; outputting fusion weights for the model-free and model-based control commands using deep reinforcement learning algorithms and game theory; and weighting and fusing the model-free and model-based control commands according to these weights to obtain the final control command, which is then output to the ship's actuators. This invention simultaneously generates both model-free and model-based control commands, combining the advantages of both control methods. It relies on model-based control to ensure system stability. It achieves basic stability and adaptive optimization through model-free control using fractional-order iterative learning, overcoming the performance deficiencies of single control methods in existing technologies. Based on deep reinforcement learning and game theory, it dynamically solves and integrates weights, abandoning fixed weight strategies and allowing weights to dynamically adjust with ship status and sea state, adapting to complex and ever-changing marine environments. It dynamically weights and integrates dual commands and outputs them for execution, giving the final control command both stability and self-learning tracking advantages, effectively improving the accuracy and robustness of ship berthing control under complex sea conditions. The entire process is based on the ship's real-time status and desired trajectory, with each step linked and connected, making the control system more aligned with the actual needs of ship berthing and improving overall control adaptability.
[0065] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0066] The following are embodiments of the apparatus of the present invention. For details not described in detail, please refer to the corresponding method embodiments described above.
[0067] Figure 2 The diagram shows a schematic representation of a ship berthing control device based on a combination of presence and absence model control games according to an embodiment of the present invention. For ease of explanation, only the parts relevant to the embodiment of the present invention are shown, and are described in detail below: like Figure 2 As shown, the ship berthing control device based on the combination of presence and absence model control game includes: acquisition module 21, controller module 22, decision module 23 and output module 24.
[0068] Module 21 is used to acquire the ship's current motion state and the desired reference trajectory; Controller module 22 is used to generate model-free control commands and model-based control commands based on the ship's current motion state and the desired reference trajectory. Decision module 23 is used to output fusion weights for model-free control instructions and model-based control instructions based on deep reinforcement learning algorithms and game theory. Decision module 23 is also used to perform weighted fusion of model-free control commands and model-based control commands according to the fusion weights to obtain the final control commands; Output module 24 is used to output the final control command obtained by weighted fusion to the ship's actuators.
[0069] In one possible implementation, when controller module 22 generates model-free control commands and model-based control commands based on the ship's current motion state and desired reference trajectory, it is used for: Based on the ship's current motion state and the desired reference trajectory, a fractional-order iterative learning controller is used to generate model-free control commands. The update rate of the fractional-order iterative learning controller is set in fractional order. Based on the ship's current motion state and the desired reference trajectory, a backstepping controller is used to generate model control commands. The backstepping controller is designed based on the ship's nonlinear dynamics model.
[0070] In one possible implementation, when controller module 22 generates model-free control commands using a fractional-order iterative learning controller based on the ship's current motion state and the desired reference trajectory, it is used for: Based on the ship's current motion state and desired reference trajectory, the control law is updated through iterative learning. Based on the current batch tracking error, the control input of the previous batch iterative learning, and the adjustment of the fractional operator matrix, model-free control commands are generated.
[0071] In one possible implementation, the controller module 22 updates the control law iteratively based on the ship's current motion state and the desired reference trajectory. When generating model-free control commands based on the current batch tracking error, the previous batch iteratively learned control input, and adjustments to the fractional-order operator matrix, it is used for: according to Generate model-free control instructions; in, Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the sampling time or the number of samples. Indicates the first Batch No. Time in succession The update rate of time-based learning. , Let them represent fractional operator matrices, Indicates the first Batch No. The tracking error corresponding to the expected reference trajectory at the given time. Indicates the first Batch No. The tracking error corresponding to the motion state variables over time. Indicates the first Batch No. The expected reference trajectory corresponding to the time. Indicates the first Batch No. The actual output trajectory corresponding to the time. Indicates the first Batch No. The motion state value corresponding to time. Indicates the first Batch No. The motion state value corresponding to time.
[0072] In one possible implementation, when controller module 22 generates model control commands using a backstepping controller based on the ship's current motion state and the desired reference trajectory, it is used for: Based on the ship's current motion state and desired reference trajectory, a virtual control law is gradually constructed, error variables are defined, and Lyapunov functions are designed by performing a global differential homeomorphic transformation on the ship's heading control system, thus obtaining model-based control commands.
[0073] In one possible implementation, the state variables set in the global differential homeomorphic transformation include: ; in, , , These represent the first-order state variable, the second-order state variable, and the third-order state variable, respectively. Indicates the ship's heading angle. Indicates the bow roll rate of the ship. express The first derivative; After a global differential homeomorphic transformation, the nonlinear dynamic model of the ship's heading motion is transformed into... ; in, express The first derivative with respect to time, express The first derivative with respect to time, express The first derivative with respect to time, This represents the estimated value of the nonlinear term. This represents the actual value of the nonlinear term. This represents the control gain coefficient. This represents the input quantity to the backstepping controller. Indicates the nonlinear hydrodynamic coefficient of a ship. Represents the ship's maneuvering time constant. This represents the servo motor time constant.
[0074] In one possible implementation, controller module 22 progressively constructs the virtual control law, defines error variables, and designs Lyapunov functions. When model-based control instructions are obtained, they are used for: Based on the transformed nonlinear dynamic model of ship heading motion, the first-level error variable is defined with the desired heading angle as the reference, the first-level virtual control law matching the first-level error variable is constructed, and the first-level Lyapunov function is designed. Based on the first-level virtual control law, the second-level error variable is defined, and the second-level Lyapunov function is designed; A second-level virtual control law matching the second-level error variable is constructed, a third-level error variable is defined, and the model control command is derived by combining the nonlinear dynamic model of the ship's heading motion.
[0075] In one possible implementation, the decision module 23, based on deep reinforcement learning algorithms and game theory, outputs fusion weights for model-free and model-based control instructions, for: By integrating incomplete information game theory and deep reinforcement learning algorithms, the fusion weights are solved using the change in ship state error as the payoff function.
[0076] In one possible implementation, after the output module 24 outputs the final control command obtained from the weighted fusion to the ship's actuators, the decision module 23 is further used for: Based on the ship's new motion state after the final control command is executed, the learning strategy is modified to achieve closed-loop optimized control for ship berthing.
[0077] The above embodiment provides a ship berthing control device based on a combination of model-based and model-free control game theory. The acquisition module acquires the ship's current motion state and desired reference trajectory; the controller module generates model-free control commands and model-based control commands based on the ship's current motion state and desired reference trajectory; the decision module, based on deep reinforcement learning algorithms and game theory, outputs fusion weights for the model-free and model-based control commands; the model-free and model-based control commands are weighted and fused according to the fusion weights to obtain the final control command; and the output module outputs the weighted and fused final control command to the ship's actuators. This invention generates both model-free and model-based dual control commands simultaneously, integrating the advantages of both control methods. It relies on model-based control to ensure basic system stability while leveraging model-free control with fractional-order iterative learning for adaptive optimization, overcoming the performance limitations of single control methods in existing technologies. Based on deep reinforcement learning and game theory, it dynamically solves for the fusion weights, abandoning fixed-weight strategies and allowing the weights to dynamically adjust with ship status and sea state, adapting to complex and ever-changing marine environments. The dual commands are dynamically weighted and fused before execution, giving the final control commands both stability and self-learning tracking advantages, effectively improving the accuracy and robustness of ship berthing control under complex sea conditions. The entire process is based on the ship's real-time status and desired trajectory, with each step linked and interconnected, making the control system more aligned with the actual needs of ship berthing and improving overall control adaptability.
[0078] Figure 3 This is a schematic diagram of an electronic device provided in an embodiment of the present invention. Figure 3 As shown, the electronic device 3 of this embodiment includes a processor 30 and a memory 31. The memory 31 stores a computer program 32. When the processor 30 executes the computer program 32, it implements the steps in the various method embodiments described above. Alternatively, when the processor 30 executes the computer program 32, it implements the functions of each module / unit in the various device embodiments described above.
[0079] For example, computer program 32 may be divided into one or more modules / units, which are stored in memory 31 and executed by processor 30 to complete the present invention. The one or more modules / units may be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of computer program 32 in electronic device 3.
[0080] Electronic device 3 may include, but is not limited to, processor 30 and memory 31. Those skilled in the art will understand that... Figure 3 This is merely an example of electronic device 3 and does not constitute a limitation on electronic device 3. It may include more or fewer components than shown, or combine certain components, or different components. For example, electronic device 3 may also include input / output devices, network access devices, buses, etc.
[0081] The processor 30 can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor.
[0082] The memory 31 can be an internal storage unit of the electronic device 3, such as a hard disk or memory of the electronic device 3. The memory 31 can also be an external storage device of the electronic device 3, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the electronic device 3. Furthermore, the memory 31 can include both internal and external storage units of the electronic device 3. The memory 31 is used to store the computer program 32 and other programs and data required by the electronic device 3. The memory 31 can also be used to temporarily store data that has been output or will be output.
[0083] For the sake of simplicity and clarity, only the above-described functional modules / units are used as examples. In practical applications, the functions described above can be assigned to different functional modules / units as needed. These modules / units can be implemented in hardware, software, or a combination of both.
[0084] This invention also provides a computer-readable storage medium storing a computer program. When the computer program is executed by a processor, it implements the methods described in the above-described method embodiments.
[0085] This invention also provides a computer program product, including a computer program. When the computer program is executed by a processor, it implements the methods described in the above-described method embodiments.
[0086] Computer programs include computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. Computer-readable media can include: any entity or device capable of carrying computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc.
[0087] In the above embodiments, the descriptions of each embodiment have their own emphasis. Parts not detailed or described in a particular embodiment can be referred to in the relevant descriptions of other embodiments. Unless otherwise specified or in conflict with logic, the terminology and / or descriptions between different embodiments are consistent and can be referenced interchangeably. Technical features in different embodiments can be combined to form new embodiments based on their inherent logical relationships.
[0088] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.
Claims
1. A ship berthing control method based on a combination of presence and absence model control games, characterized in that, include: Obtain the ship's current motion state and desired reference trajectory; Based on the ship's current motion state and the desired reference trajectory, generate model-free control commands and model-based control commands; Based on deep reinforcement learning algorithms and game theory, the system outputs fusion weights for the model-free control instructions and the model-based control instructions. The model-less control command and the model-based control command are weighted and fused according to the fusion weight to obtain the final control command and output to the ship's actuators.
2. The ship berthing control method based on a combination of presence and absence model control games according to claim 1, characterized in that, Based on the ship's current motion state and the desired reference trajectory, model-free control commands and model-based control commands are generated, including: Based on the ship's current motion state and the desired reference trajectory, a fractional-order iterative learning controller is used to generate the model-free control command. The update rate of the fractional-order iterative learning controller is set in fractional order. Based on the ship's current motion state and the desired reference trajectory, a backstepping controller is used to generate the model-based control command. The backstepping controller is designed based on the ship's nonlinear dynamics model.
3. The ship berthing control method based on a combination of presence and absence model control games according to claim 2, characterized in that, Based on the ship's current motion state and the desired reference trajectory, a fractional-order iterative learning controller is used to generate the model-free control commands, including: Based on the ship's current motion state and the desired reference trajectory, the control law is updated through iterative learning. Based on the current batch tracking error, the control input of the previous batch iterative learning, and the adjustment of the fractional operator matrix, a model-free control command is generated.
4. The ship berthing control method based on a combination of presence and absence model control games according to claim 3, characterized in that, Based on the ship's current motion state and the desired reference trajectory, the control law is updated through iterative learning. Based on the current batch tracking error, the control input from the previous batch iterative learning, and adjustments to the fractional-order operator matrix, model-free control commands are generated, including: according to Generate model-free control instructions; in, Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the first Batch No. Model-free control instructions generated by time-iterative learning Indicates the sampling time or the number of samples. Indicates the first Batch No. Update rate of time-based iterative learning , Let them represent fractional operator matrices, Indicates the first Batch No. The tracking error corresponding to the expected reference trajectory at the given time. Indicates the first Batch No. The tracking error corresponding to the motion state variables over time. Indicates the first Batch No. The expected reference trajectory corresponding to the time. Indicates the first Batch No. The actual output trajectory corresponding to the time. Indicates the first Batch No. The motion state value corresponding to time. Indicates the first Batch No. The motion state value corresponding to time.
5. The ship berthing control method based on a combination of presence and absence model control games according to claim 2, characterized in that, Based on the ship's current motion state and the desired reference trajectory, a backstepping controller generates the model-based control commands, including: Based on the ship's current motion state and the desired reference trajectory, the virtual control law is gradually constructed, error variables are defined, and Lyapunov functions are designed by performing a global differential homeomorphic transformation on the ship's heading control system, thereby obtaining the model-based control command.
6. The ship berthing control method based on a combination of presence and absence model control games according to claim 5, characterized in that, The state variables set in the global differential homeomorphic transformation include: ; in, , , These represent the first-order state variable, the second-order state variable, and the third-order state variable, respectively. Indicates the ship's heading angle. Indicates the bow roll rate of the ship. express The first derivative; After the global differential homeomorphic transformation, the nonlinear dynamic model of the ship's heading motion is transformed into... ; in, express The first derivative with respect to time, express The first derivative with respect to time, express The first derivative with respect to time, This represents the estimated value of the nonlinear term. This represents the actual value of the nonlinear term. This represents the control gain coefficient. This represents the input quantity to the backstepping controller. Indicates the nonlinear hydrodynamic coefficient of a ship. Represents the ship's maneuvering time constant. This represents the servo motor time constant.
7. The ship berthing control method based on a combination of presence and absence model control games according to claim 6, characterized in that, The virtual control law is constructed step by step, error variables are defined, and Lyapunov functions are designed to obtain the model-based control command, including: Based on the transformed nonlinear dynamic model of the ship's heading motion, a first-level error variable is defined with the desired heading angle as the reference, a first-level virtual control law matching the first-level error variable is constructed, and a first-level Lyapunov function is designed. Based on the first-level virtual control law, a second-level error variable is defined, and a second-level Lyapunov function is designed; A second-level virtual control law matching the second-level error variable is constructed, a third-level error variable is defined, and the model-based control command is derived by combining the nonlinear dynamic model of the ship's heading motion.
8. The ship berthing control method based on a combination of presence and absence model control games according to any one of claims 1-7, characterized in that, Based on deep reinforcement learning algorithms and game theory, the fusion weights for the model-free control instructions and the model-based control instructions are output, including: The fusion weights are solved by integrating incomplete information game theory and deep reinforcement learning algorithms, using the change in ship state error as the payoff function.
9. The ship berthing control method based on a combination of presence and absence model control games according to claim 8, characterized in that, After receiving the final control command and outputting it to the ship's actuators, the process also includes: Based on the ship's new motion state after the final control command is executed, the learning strategy is corrected to achieve closed-loop optimized control for the ship's berthing.
10. A ship berthing control device based on a combination of presence-absence model control games, characterized in that, include: The acquisition module is used to acquire the ship's current motion state and the desired reference trajectory; The controller module is used to generate model-free control commands and model-based control commands based on the ship's current motion state and the desired reference trajectory. The decision module is used to output fusion weights for the model-free control command and the model-based control command based on deep reinforcement learning algorithms and game theory. The decision module is further configured to perform weighted fusion of the model-free control command and the model-based control command according to the fusion weight to obtain the final control command; The output module is used to output the final control command obtained by weighted fusion to the ship's actuators.