Reinforcement Learning Algorithm Based on Immune Tolerance Mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An immune tolerance and reinforcement learning technology, applied in the field of reinforcement learning algorithms, can solve the problems that the algorithm is easy to fall into local extremum and not converge, and achieve the effect of ensuring the global optimization ability

Inactive Publication Date: 2016-02-24

XIAN UNIV OF TECH

View PDF2 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Therefore, TD(λ) based on value function approximation has attracted more and more attention, but the algorithm is prone to fall into local extremum and is not convergent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0109] The implementation process of the reinforcement learning algorithm based on the immune tolerance mechanism in the present invention will be illustrated below through an example of robot path planning.

[0110] (1) First, determine the path map of the robot, using a 20×20 grid map, represented by a matrix M, element 0 in M represents the passable area, and element 1 represents obstacles.

[0111] (2) Secondly, initialize parameters, see step 1.

[0112] (3) Starting from the starting position, if the position of the robot basically does not change within k time steps, that is, the distance between the position of the kth step before the current time step and the current position is greater than a certain threshold D max , then use immunity to optimize the learning system, jump to (4), otherwise jump to (5).

[0113] (4) Execute steps 3 to 7 for the weights in the neural network.

[0114] (5) if Figure 6 As shown, the 8 locations adjacent to the current location are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The reinforcement learning algorithm based on the immune tolerance mechanism, first, design the basis function vector and weight vector of TD(λ); then, encode the weight vector according to the floating point number, when the error between the system and the real environment is greater than a certain threshold, It is regarded as the first response in the artificial immune system. When encountering the environment for the first time, the immune tolerance mechanism is used to optimize, and the memory, namely antibodies, is used to memorize the environmental knowledge; then the optimal strategy is selected according to the current system parameters, according to the environment The feedback reward value r is used to update the system parameters and continue to the next iteration; when the error between the system and the real environment is less than the threshold, it is considered that a similar environment has been encountered, and it is regarded as a secondary response in the artificial immune system. System parameters, the system judges the action selection and chooses the optimal strategy.

Description

technical field [0001] The invention relates to a reinforcement learning algorithm based on immune tolerance mechanism. Background technique [0002] Reinforcement learning is a kind of machine learning algorithm between supervised learning and unsupervised learning. It originated from behavioral psychology. It was developed in the 1980s and is currently widely used in game competitions, control systems, scheduling management, and robotics. , is a hotspot in the field of machine learning research. [0003] Reinforcement learning can learn the environment based on deterministic or non-deterministic rewards without knowing the model. Typical reinforcement learning algorithms are: Sarsa learning algorithm, Q learning algorithm, TD (λ) learning algorithm. Among them, TD(λ) learning algorithm includes tabular TD(λ) and TD(λ) based on value function approximation. In the Sarsa learning algorithm, Q learning algorithm, and tabular TD(λ), a large amount of storage space is requir...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06N3/00G06N3/08

Inventor 王磊黑新宏金海燕林叶王玉

Owner XIAN UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Reinforcement Learning Algorithm Based on Immune Tolerance Mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology