Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Reinforcement Learning Algorithm Based on Immune Tolerance Mechanism

An immune tolerance and reinforcement learning technology, applied in the field of reinforcement learning algorithms, can solve the problems that the algorithm is easy to fall into local extremum and not converge, and achieve the effect of ensuring the global optimization ability

Inactive Publication Date: 2016-02-24
XIAN UNIV OF TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, TD(λ) based on value function approximation has attracted more and more attention, but the algorithm is prone to fall into local extremum and is not convergent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement Learning Algorithm Based on Immune Tolerance Mechanism
  • Reinforcement Learning Algorithm Based on Immune Tolerance Mechanism
  • Reinforcement Learning Algorithm Based on Immune Tolerance Mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0109] The implementation process of the reinforcement learning algorithm based on the immune tolerance mechanism in the present invention will be illustrated below through an example of robot path planning.

[0110] (1) First, determine the path map of the robot, using a 20×20 grid map, represented by a matrix M, element 0 in M ​​represents the passable area, and element 1 represents obstacles.

[0111] (2) Secondly, initialize parameters, see step 1.

[0112] (3) Starting from the starting position, if the position of the robot basically does not change within k time steps, that is, the distance between the position of the kth step before the current time step and the current position is greater than a certain threshold D max , then use immunity to optimize the learning system, jump to (4), otherwise jump to (5).

[0113] (4) Execute steps 3 to 7 for the weights in the neural network.

[0114] (5) if Figure 6 As shown, the 8 locations adjacent to the current location are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The reinforcement learning algorithm based on the immune tolerance mechanism, first, design the basis function vector and weight vector of TD(λ); then, encode the weight vector according to the floating point number, when the error between the system and the real environment is greater than a certain threshold, It is regarded as the first response in the artificial immune system. When encountering the environment for the first time, the immune tolerance mechanism is used to optimize, and the memory, namely antibodies, is used to memorize the environmental knowledge; then the optimal strategy is selected according to the current system parameters, according to the environment The feedback reward value r is used to update the system parameters and continue to the next iteration; when the error between the system and the real environment is less than the threshold, it is considered that a similar environment has been encountered, and it is regarded as a secondary response in the artificial immune system. System parameters, the system judges the action selection and chooses the optimal strategy.

Description

technical field [0001] The invention relates to a reinforcement learning algorithm based on immune tolerance mechanism. Background technique [0002] Reinforcement learning is a kind of machine learning algorithm between supervised learning and unsupervised learning. It originated from behavioral psychology. It was developed in the 1980s and is currently widely used in game competitions, control systems, scheduling management, and robotics. , is a hotspot in the field of machine learning research. [0003] Reinforcement learning can learn the environment based on deterministic or non-deterministic rewards without knowing the model. Typical reinforcement learning algorithms are: Sarsa learning algorithm, Q learning algorithm, TD (λ) learning algorithm. Among them, TD(λ) learning algorithm includes tabular TD(λ) and TD(λ) based on value function approximation. In the Sarsa learning algorithm, Q learning algorithm, and tabular TD(λ), a large amount of storage space is requir...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/00G06N3/08
Inventor 王磊黑新宏金海燕林叶王玉
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products