State distribution perception sampling-based deep-value-function learning method of agent

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A value function and state distribution technology, applied in the field of enhanced learning, can solve problems such as large differences in quantity, achieve the effects of enhancing expression ability, solving sample selection problems, and improving learning quality

Active Publication Date: 2018-10-12

ZHEJIANG UNIV

View PDF7 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this method does not fundamentally solve two problems: 1. The importance of samples in different states is close, but the number of generated is quite different. According to what standard to sample from the empirical data set can avoid redundant samples. Oversampling; 2. Since the sample itself is very high-dimensional, huge in number and constantly generated, it is a key factor to effectively analyze a large number of high-dimensional samples. How to efficiently sample from a large number of continuously generated sample sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0059] The implementation method of this embodiment is as described above, and specific steps are not described in detail, and the effect is shown below only for case data.

[0060] First, the hash method is used to reduce the dimension and classify the abstract expression of the state set observed by the agent obtained by the convolutional neural network, so as to perceive the state space distribution. On this basis, the samples in the empirical data set are selected reasonably. Finally, use the selected sample data to train the value function of the agent, so that it has a more accurate judgment of the environment. The result is figure 1 , 2 , 3 shown.

[0061] figure 1 It is the result of visualizing the sample after performing steps S1 and S2 of the present invention for the original empirical data of the present invention, that is, a schematic diagram of the distribution of the sample in the state space;

[0062] figure 2 In order to adopt three sampling methods, namely a) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a state distribution perception sampling-based deep-value-function learning method of an agent. The method is used for more quickly learning a value function by the agent underfewer samples, and specifically includes the following steps: 1) obtaining empirical data used for learning the value function by the agent, and defining an algorithm target; 2) using a convolutionalneural network to preprocess the empirical data to obtain a more expressive feature set; 3) using an unsupervised method to cluster the empirical data set in feature space of the empirical data set;4) using a uniform sampling and cluster equal-probability sampling interpolation-based sample state distribution perception sampling method to carry out sampling according to state distribution of theempirical data set; and 5) using sampled samples by the agent for learning of the value function. The method is suitable for use in the game problem of the reinforcement learning field, and can quickly achieve a better result in a case of lesser sample quantity.

Description

Technical field [0001] The present invention belongs to the field of enhanced learning, and is a branch of the field of machine learning, and particularly relates to a sample sampling method based on empirical data state distribution perception. Background technique [0002] Sample selection is an important issue in the field of machine learning, and different selection methods directly affect the quality of model learning. In the field of reinforcement learning, sample sampling from empirical data sets can help overcome the problem of sample correlation and forgetting early samples. The goal of sample sampling is to select samples from the sample set that can accelerate the model convergence and enhance the agent's ability to perceive the environment. The traditional method generally adopts random and uniform sampling to sample from the empirical data set. This method is easy to cause sample imbalance, which makes the learning speed of the agent slow. [0003] The existing sampl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N99/00G06K9/62

CPCG06F18/2321G06F18/24

Inventor 李玺李伟超皇福献

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

State distribution perception sampling-based deep-value-function learning method of agent

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology