A Floating Control Method for Target Area of Underwater Vehicle Based on Double Critic Reinforcement Learning Technology

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An underwater vehicle and target area technology, which is applied in neural learning methods, underwater ships, and underwater operation equipment, etc., can solve the problem of the increase in the number of Q values, the slow convergence speed of algorithm training, easy acquisition without consideration, and reliable performance Expert data and other issues, to achieve good control effect, fast convergence effect

Active Publication Date: 2022-03-25

SHANDONG UNIV

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] However, existing research and inventions based on traditional RL for underwater vehicle control have some significant defects: First, based on traditional reinforcement learning algorithms such as Q-learning, it is necessary to construct a huge Q-value table to store high The Q(s, a) value exists in the latitude action space and state space, and with the continuous training of the agent in the high-dimensional action and state space, the number of Q values in the Q value table will explode. makes this method very limited

Then, with the combination of deep learning and traditional Q-learning technology proposed by the Google Deepmind team, the deep reinforcement learning algorithm DRL (Deep reinforcement learning) algorithm was born. In this algorithm, the Q value table is replaced by the neural network and becomes DQN (Deep Q Net) (V.Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol.518, pp.529-533, 2015.), but the DQN algorithm is only suitable for discrete action spaces, which restricts its application to Intelligent control of underwater vehicles; DDPG (Deep Deterministic Policy Gradient) (Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep inforcement learning [J]. Computer ence, 2015, 8(6): A187.) It is a control algorithm suitable for continuous action space, but the Q(s, a) output by the critic network comes from the expectation of the action-value function, which leads to the disadvantage of overestimation

Moreover, the above RL method does not consider expert data that is easy to obtain and has reliable performance, which leads to the slow convergence speed of the algorithm in training, and there is a lot of randomness in the early stage of training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0098] A method for controlling the floating of an underwater vehicle target area based on double-critician reinforcement learning technology. The implementation process of the present invention is divided into two parts, the task environment construction stage and the floating strategy training stage, including the following steps:

[0099] 1. Define the task environment and model:

[0100] 1-1. Construct the task environment of the target area where the underwater vehicle is located and the dynamic model of the underwater vehicle;

[0101] Using the python language to write the underwater vehicle simulation environment task environment in the vscode integrated compilation environment, the geographic coordinate system E-ξηζ of the constructed simulated pool map is as follows image 3 As shown, the size of the three-dimensional pool is set to 50 meters * 50 meters * 50 meters, and the successful floating area of the target area is a cylindrical area with the center of the wa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method for controlling the floating of the target area of an underwater vehicle based on double-critician reinforcement learning technology, which belongs to the technical field of marine control experiments, and is based on the DDPG algorithm framework in deep reinforcement learning. Both the previously obtained expert data and the interaction data obtained from the interaction between the agent and the task environment are used, and the mixed collection of the two greatly improves the convergence speed of the algorithm. At the same time, the present invention utilizes two sets of critic networks independent of each other, and obtains the loss function of the actor network by taking the minimum value of Q(s, a) respectively output by the two groups, which effectively reduces the overestimation existing in the reinforcement learning algorithm.

Description

technical field [0001] The invention relates to a method for controlling the floating of a target area of an underwater vehicle based on double-critician reinforcement learning technology, and belongs to the technical field of ocean control experiments. Background technique [0002] As a key marine equipment, underwater vehicles are widely used in many scientific research and engineering fields such as ocean topographic mapping, resource exploration, archaeological investigation, pipeline maintenance, biological monitoring, etc., and are an important means for human beings to explore the ocean. However, the seabed environment is complex and changeable. Underwater vehicles working in such an environment will inevitably lead to economic losses and loss of important data if they fail to float up to the area where the mother ship is located in a timely, safe and intelligent manner when encountering a fault or strong interference. . Therefore, in order to enhance the adaptabil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F30/28G06N3/04G06N3/08B63G8/18B63G8/14

CPCG06N3/08G06F30/28B63G8/14B63G8/18G06N3/045

Inventor 李沂滨张天泽缪旭弘魏征尤岳周广礼贾磊庄英豪宋艳

Owner SHANDONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Floating Control Method for Target Area of ​​Underwater Vehicle Based on Double Critic Reinforcement Learning Technology

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

A Floating Control Method for Target Area of Underwater Vehicle Based on Double Critic Reinforcement Learning Technology

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology