Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A Floating Control Method for Target Area of ​​Underwater Vehicle Based on Double Critic Reinforcement Learning Technology

An underwater vehicle and target area technology, which is applied in neural learning methods, underwater ships, and underwater operation equipment, etc., can solve the problem of the increase in the number of Q values, the slow convergence speed of algorithm training, easy acquisition without consideration, and reliable performance Expert data and other issues, to achieve good control effect, fast convergence effect

Active Publication Date: 2022-03-25
SHANDONG UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, existing research and inventions based on traditional RL for underwater vehicle control have some significant defects: First, based on traditional reinforcement learning algorithms such as Q-learning, it is necessary to construct a huge Q-value table to store high The Q(s, a) value exists in the latitude action space and state space, and with the continuous training of the agent in the high-dimensional action and state space, the number of Q values ​​in the Q value table will explode. makes this method very limited
Then, with the combination of deep learning and traditional Q-learning technology proposed by the Google Deepmind team, the deep reinforcement learning algorithm DRL (Deep reinforcement learning) algorithm was born. In this algorithm, the Q value table is replaced by the neural network and becomes DQN (Deep Q Net) (V.Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol.518, pp.529-533, 2015.), but the DQN algorithm is only suitable for discrete action spaces, which restricts its application to Intelligent control of underwater vehicles; DDPG (Deep Deterministic Policy Gradient) (Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep inforcement learning [J]. Computer ence, 2015, 8(6): A187.) It is a control algorithm suitable for continuous action space, but the Q(s, a) output by the critic network comes from the expectation of the action-value function, which leads to the disadvantage of overestimation
Moreover, the above RL method does not consider expert data that is easy to obtain and has reliable performance, which leads to the slow convergence speed of the algorithm in training, and there is a lot of randomness in the early stage of training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Floating Control Method for Target Area of ​​Underwater Vehicle Based on Double Critic Reinforcement Learning Technology
  • A Floating Control Method for Target Area of ​​Underwater Vehicle Based on Double Critic Reinforcement Learning Technology
  • A Floating Control Method for Target Area of ​​Underwater Vehicle Based on Double Critic Reinforcement Learning Technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0098] A method for controlling the floating of an underwater vehicle target area based on double-critician reinforcement learning technology. The implementation process of the present invention is divided into two parts, the task environment construction stage and the floating strategy training stage, including the following steps:

[0099] 1. Define the task environment and model:

[0100] 1-1. Construct the task environment of the target area where the underwater vehicle is located and the dynamic model of the underwater vehicle;

[0101] Using the python language to write the underwater vehicle simulation environment task environment in the vscode integrated compilation environment, the geographic coordinate system E-ξηζ of the constructed simulated pool map is as follows image 3 As shown, the size of the three-dimensional pool is set to 50 meters * 50 meters * 50 meters, and the successful floating area of ​​the target area is a cylindrical area with the center of the wa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for controlling the floating of the target area of ​​an underwater vehicle based on double-critician reinforcement learning technology, which belongs to the technical field of marine control experiments, and is based on the DDPG algorithm framework in deep reinforcement learning. Both the previously obtained expert data and the interaction data obtained from the interaction between the agent and the task environment are used, and the mixed collection of the two greatly improves the convergence speed of the algorithm. At the same time, the present invention utilizes two sets of critic networks independent of each other, and obtains the loss function of the actor network by taking the minimum value of Q(s, a) respectively output by the two groups, which effectively reduces the overestimation existing in the reinforcement learning algorithm.

Description

technical field [0001] The invention relates to a method for controlling the floating of a target area of ​​an underwater vehicle based on double-critician reinforcement learning technology, and belongs to the technical field of ocean control experiments. Background technique [0002] As a key marine equipment, underwater vehicles are widely used in many scientific research and engineering fields such as ocean topographic mapping, resource exploration, archaeological investigation, pipeline maintenance, biological monitoring, etc., and are an important means for human beings to explore the ocean. However, the seabed environment is complex and changeable. Underwater vehicles working in such an environment will inevitably lead to economic losses and loss of important data if they fail to float up to the area where the mother ship is located in a timely, safe and intelligent manner when encountering a fault or strong interference. . Therefore, in order to enhance the adaptabil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F30/28G06N3/04G06N3/08B63G8/18B63G8/14
CPCG06N3/08G06F30/28B63G8/14B63G8/18G06N3/045
Inventor 李沂滨张天泽缪旭弘魏征尤岳周广礼贾磊庄英豪宋艳
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products