Agent training method based on DQN

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A training method and agent technology, applied in the field of DQN-based agent training, can solve problems such as reducing execution efficiency, accelerating network convergence, and long time resources, and achieving the effect of reducing time resources, improving immediacy, and speeding up execution efficiency

Pending Publication Date: 2022-02-18

XIAN TECHNOLOGICAL UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In many cases, depending on the specific problem, the agent obtains less effective reward information in the early stage of the algorithm, which cannot effectively accelerate network convergence, and in the case of less effective environmental information in the early stage, blind exploration will lead to Consumes a lot of time resources, thereby reducing the execution efficiency, resulting in the time resources consumed by the training of the agent in the game problem is too long, and the immediacy is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0048] Please refer to figure 1 and figure 2 , figure 1 It is a schematic diagram of a DQN-based agent training method provided by an embodiment of the present invention; figure 2 It is a schematic diagram of a DQN network parameter optimization process provided by an embodiment of the present invention. As shown in the figure, the DQN-based agent training method of this embodiment includes:

[0049] S1: Randomly initialize the network parameters of the DQN network to obtain several initial DQN networks;

[0050] The DQN network consists of three parts: input layer and hidden layer, hidden layer and hidden layer, hidden layer and output layer. Since reinforcement learning is an unsupervised learning problem, each network output is also the network input of the next iteration, and The parameters are updated by gradient descent.

[0051] In this embodiment, the network parameter encoding method of the DQN network adopts floating-point encoding, specifically, the network p...

Embodiment 2

[0108] In this embodiment, the effect of the DQN-based agent training method in Embodiment 1 is verified and explained through a specific case. See image 3 , image 3 It is a structure diagram of a test case model provided by the embodiment of the present invention.

[0109] The experimental environment used in the test case of this embodiment is the open classic control model "CartPole-v1" under OpenAI Gym. The present invention improves the original model on the basis of it, and uses Matlab2018b for simulation. Among them, Table 1 and Table 2 give the parameter names and value ranges involved in the model.

[0110] Table 1 CartPole-v1 status (State) information

[0111]

[0112]

[0113] Table 2 CartPole-v1 action (Action) information

[0114]

[0115] This embodiment redefines the state variable (State), action space (Action), movement space of the inverted pendulum trolley, range of balance angle of the inverted pendulum and rendering of the picture of the m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an agent training method based on DQN, which comprises the following steps of: randomly initializing network parameters of DQN networks to obtain a plurality of initial DQN networks; mapping all network parameters of each initial DQN network to obtain an individual, and forming an initial population by all individuals; performing differential evolution operation on the initial population to obtain a new generation of network parameter population, and taking the new generation of network parameter population as a next generation of initial population to repeat the differential evolution operation until a preset evolution condition is reached to obtain a final network parameter population; evaluating each individual in the final network parameter population by using a preset fitness function, and outputting information of an optimal individual; initializing network parameters of the DQN network according to the information of the optimal individual; and training the parameter-initialized DQN network to obtain the intelligent agent. According to the method, in the DQN training process, the execution efficiency is improved, time resources needed by training are reduced, and the instantaneity of intelligent agent training in the game problem is improved.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence, and in particular relates to a DQN-based intelligent agent training method. Background technique [0002] Deep Reinforcement Learning (DRL), as a hot topic in the field of artificial intelligence at present, integrates the ideas of reinforcement learning and deep learning, and effectively solves the technical problems that cannot be solved by reinforcement learning. For the decision-making problems of continuous action space and continuous state space, deep reinforcement learning uses neural network to efficiently fit the problem of continuous state and action space, and provides a more effective solution for agents to explore complex high-dimensional environments. . [0003] As the pioneering work of deep reinforcement learning, Deep Q Network (DQN) effectively solves the continuous and large model-free problem in the state space. Its main idea is still similar to reinforcement...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/04G06N3/08A63F13/822

CPCG06N3/086A63F13/822A63F2300/807G06N3/045

Inventor 曹子建贾浩文傅妍芳容晓峰杜志强王振雨李骁李建

Owner XIAN TECHNOLOGICAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Agent training method based on DQN

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology