An unmanned ship path planning method based on a Q learning neural network

A path planning and neural network technology, applied in the field of intelligent control of unmanned ships, can solve problems such as path planning in unknown fields

Inactive Publication Date: 2019-05-07
ZHEJIANG FORESTRY UNIVERSITY
View PDF5 Cites 45 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method needs to know the location of the environmental te

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An unmanned ship path planning method based on a Q learning neural network
  • An unmanned ship path planning method based on a Q learning neural network
  • An unmanned ship path planning method based on a Q learning neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0078] A kind of unmanned ship path planning method based on Q learning neural network of the present embodiment comprises the following steps: a kind of unmanned ship path planning method based on Q learning neural network is characterized in that, comprises the following steps:

[0079] a), initializing storage area D;

[0080] b) Initialize the Q network, the initial value of the state and action; the Q network contains the following elements: S, A, P s,α , R, where S represents the set of system states the USV is in, A represents the set of actions that the USV can take, P s,α Represents the system state transition probability, R represents the reward function;

[0081] c), Randomly set the training target;

[0082] d), randomly select action a t , get the current reward r t , the next moment state s t+1 , will (s t ,a t ,r t ,s t+1 ) is stored in the storage area D;

[0083] e), randomly sample a batch of data from the storage area D for training, that is, a ba...

Embodiment 2

[0088] A kind of unmanned ship path planning method based on Q-learning neural network of the present embodiment, based on embodiment one, traditional Q-learning algorithm specifically is:

[0089] Q learning is based on the Markov decision process (Markov Decision Process) to describe the problem. The Markov decision process contains 4 elements: S, A, P s,a ,R. Among them, S represents the system state set where the USV is located, that is, the current state of the USV and the state of the current environment, such as the size and position of obstacles; A represents the set of actions that the USV can take, that is, the direction of rotation of the USV; P s,a Represents the system model, that is, the system state transition probability, P(s'|s,a) describes the probability of the system reaching state s after executing action a in the current state s; R represents the reward function, which has the current state and all The action taken decides. Think of Q-learning as an inc...

Embodiment 3

[0108] An unmanned ship path planning method based on the Q-learning neural network in this embodiment is based on the second embodiment, as long as the future TD deviation value is unknown, the above update cannot be performed. However, they can be calculated incrementally by using traces. η t (s, a) is defined as a characteristic function: when (s, a) occurs at time t, it returns 1, otherwise it returns 0. For simplicity, ignoring the learning efficiency, define a trace e for each (s, a) t (s,a)

[0109]

[0110]

[0111] Then at time t the online update is

[0112] Q(s,a)=Q(s,a)+α[δ' t n t (s,a)+

[0113] δ t e t (s,a)] (8)

[0114] Among them, the function Q(s, a) is to perform action a in state s, α is the learning rate, η t (s,a) is the characteristic function, e t (s,a) is the trace, δ' t Represents the bias value of past learning, δ 1 is the deviation value learned now, δ 1 is the deviation value δ' between the cumulative return R(s) and the current...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unmanned ship path planning method based on a Q learning neural network. The method comprises the following steps: a) initializing a storage area D; B) initializing a Q network, a state and an action initial value; C) randomly setting a training target; D) randomly selecting an action at a to obtain a current reward rt and a next moment state st + 1, and storing (st, at,rt and st + 1) into a storage area D; E) randomly sampling a batch of data from the storage area D for training, namely a batch (st, at, rt, st + 1), and considering the state when the USV reaches the target position or exceeds the maximum time of each round as the final state; F) if the st + 1 is not the final state, returning to the step d), if the st + 1 is the final state, updating Q networkparameters, returning to the step d), and repeating n rounds to finish the algorithm; And g) setting a target, and carrying out path planning by using the trained Q network until the USV reaches the target position. The decision-making time is short, the path is more optimized, and the real-time requirement of online planning can be met.

Description

technical field [0001] The invention belongs to the field of intelligent control of unmanned ships, and in particular relates to a path planning method for unmanned ships based on a Q-learning neural network. Background technique [0002] Water quality monitoring is the main method to evaluate water quality and prevent water pollution. With the increase of industrial wastewater, the problem of water pollution is becoming more and more serious, and the demand for dynamic monitoring of water pollution is urgent. However, because the traditional water quality monitoring method has many steps and takes a long time, the diversity and accuracy of the obtained data are far from meeting the needs of decision-making. According to the above problems, a variety of water quality monitoring methods have been proposed. For example, Cao Lijie and others proposed to obtain a more accurate water quality inversion model by establishing a sensor network. Tian Ye et al proposed to invert the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06Q10/04G06N3/08
Inventor 冯海林吕扬民方益明周国模
Owner ZHEJIANG FORESTRY UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products