Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for variable mass underwater vehicle obstacle avoidance based on deep reinforcement learning

An underwater vehicle, reinforcement learning technology, applied in neural learning methods, design optimization/simulation, biological neural network models, etc., can solve problems such as inability to converge, slow convergence, and slow actor-critic network convergence.

Active Publication Date: 2022-04-29
SHANDONG UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, in order to increase the coverage of learning, the DDPG algorithm will add a certain amount of noise N to the selected action A, which is conducive to the early exploration of training, but in the later stage of training, after the actor network has a better performance, these noises make it worse. The performance of the actor network degrades, and even causes it to fail to converge; secondly, the DDPG algorithm does not use expert data, and the early training relies on random data, resulting in a slow convergence speed
[0007] To sum up, the existing underwater vehicles have the problem of movement flexibility; in the method of constructing the dynamic model of underwater vehicles, in the reinforcement learning control algorithm based on DDPG, the exploration noise may have a negative impact on the later training, DDPG is In a typical offline learning algorithm, the learning strategy is a deterministic strategy, that is, a deterministic actor network, and the exploration part can customize the exploration noise. In the traditional method, the deterministic strategy is added to Gaussian noise to form an actor network, although it is beneficial to the agent. Explore more action spaces to select the best action, but after the actor network achieves better performance, this kind of noise will increase the uncertainty of action selection and affect the performance of the network; finally, the DDPG algorithm in the early training When the experience is less, there is a large amount of random data in the experience playback pool, which will greatly reduce the convergence speed of the actor-critic network, or even fail to converge. There is too much randomness in the early stage of training, resulting in too slow convergence speed The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for variable mass underwater vehicle obstacle avoidance based on deep reinforcement learning
  • Method and system for variable mass underwater vehicle obstacle avoidance based on deep reinforcement learning
  • Method and system for variable mass underwater vehicle obstacle avoidance based on deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] Such as figure 1 As shown, the present embodiment provides a method for avoiding obstacles of a variable mass underwater vehicle based on deep reinforcement learning, including:

[0043] S1: Construct an obstacle avoidance simulation model based on a deep reinforcement learning network according to the motion state of the variable-mass underwater vehicle and the action of the actuator;

[0044] S2: Store the pre-acquired complete trajectory of historical obstacle avoidance tasks as expert data into the experience playback pool, obtain the current execution action according to the initial motion state and Gaussian noise of the variable-mass underwater vehicle, and obtain the new operation state and current execution action according to the current execution action The reward value of the action is stored in the experience playback pool;

[0045] S3: Train the obstacle avoidance simulation model based on the deep reinforcement learning network according to the experience...

Embodiment 2

[0111] This embodiment provides a variable mass underwater vehicle obstacle avoidance system based on deep reinforcement learning, including:

[0112] The model construction module is configured to construct an obstacle avoidance simulation model based on a deep reinforcement learning network according to the motion state of the variable-mass underwater vehicle and the action of the actuator;

[0113] The experience acquisition module is configured to store the pre-acquired complete trajectory of the historical obstacle avoidance task as expert data into the experience playback pool, obtain the current execution action according to the initial motion state of the variable-mass underwater vehicle and Gaussian noise, and obtain the new execution action according to the current execution action. The running status and the reward value of the currently executed action are stored in the experience playback pool;

[0114] The training module is configured to train the obstacle avoid...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an obstacle avoidance method and system for a variable-mass underwater vehicle based on deep reinforcement learning, including: constructing an obstacle avoidance simulation model based on a deep reinforcement learning network according to the motion state and the action of an actuator of the variable-mass underwater vehicle; The complete trajectory of the historical obstacle avoidance task is stored in the experience playback pool as expert data, the current execution action is obtained according to the initial motion state and Gaussian noise of the variable mass underwater vehicle, and the new operation state and the reward value of the current execution action are obtained according to the current execution action. And stored in the experience playback pool; according to the experience playback pool, the obstacle avoidance simulation model is trained, and the Gaussian noise is updated according to the reward value of the current training execution action and the average reward value of the historical training; The obstacle avoidance simulation model obtains the driving path of the variable mass underwater vehicle obstacle avoidance task. Improve the DDPG network model based on deep reinforcement learning to solve the obstacle avoidance problem of underwater vehicles.

Description

technical field [0001] The invention relates to the technical field of obstacle avoidance for underwater vehicles, in particular to an obstacle avoidance method and system for variable mass underwater vehicles based on deep reinforcement learning. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] In recent years, with the rapid development of technologies such as automatic control and artificial intelligence, more and more underwater vehicles have been used in various underwater operations, responsible for performing various tasks such as submarine navigation, exploration and survey, and through water absorption and drainage. The movement of variable-mass underwater vehicles that control their own ups and downs is more flexible than ordinary underwater vehicles; however, the seabed environment is complex and there are many obstacles, and the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F30/15G06F30/27G06N3/04G06N3/08
CPCG06F30/15G06F30/27G06N3/084G06N3/045Y02T90/00
Inventor 李沂滨李沐阳缪旭弘魏征尤岳周广礼贾磊庄英豪宋艳
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products