Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

State distribution perception sampling-based deep-value-function learning method of agent

A value function and state distribution technology, applied in the field of enhanced learning, can solve problems such as large differences in quantity, achieve the effects of enhancing expression ability, solving sample selection problems, and improving learning quality

Active Publication Date: 2018-10-12
ZHEJIANG UNIV
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method does not fundamentally solve two problems: 1. The importance of samples in different states is close, but the number of generated is quite different. According to what standard to sample from the empirical data set can avoid redundant samples. Oversampling; 2. Since the sample itself is very high-dimensional, huge in number and constantly generated, it is a key factor to effectively analyze a large number of high-dimensional samples. How to efficiently sample from a large number of continuously generated sample sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • State distribution perception sampling-based deep-value-function learning method of agent
  • State distribution perception sampling-based deep-value-function learning method of agent
  • State distribution perception sampling-based deep-value-function learning method of agent

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0059] The implementation method of this embodiment is as described above, and specific steps are not described in detail, and the effect is shown below only for case data.

[0060] First, the hash method is used to reduce the dimension and classify the abstract expression of the state set observed by the agent obtained by the convolutional neural network, so as to perceive the state space distribution. On this basis, the samples in the empirical data set are selected reasonably. Finally, use the selected sample data to train the value function of the agent, so that it has a more accurate judgment of the environment. The result is figure 1 , 2 , 3 shown.

[0061] figure 1 It is the result of visualizing the sample after performing steps S1 and S2 of the present invention for the original empirical data of the present invention, that is, a schematic diagram of the distribution of the sample in the state space;

[0062] figure 2 In order to adopt three sampling methods, namely a) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a state distribution perception sampling-based deep-value-function learning method of an agent. The method is used for more quickly learning a value function by the agent underfewer samples, and specifically includes the following steps: 1) obtaining empirical data used for learning the value function by the agent, and defining an algorithm target; 2) using a convolutionalneural network to preprocess the empirical data to obtain a more expressive feature set; 3) using an unsupervised method to cluster the empirical data set in feature space of the empirical data set;4) using a uniform sampling and cluster equal-probability sampling interpolation-based sample state distribution perception sampling method to carry out sampling according to state distribution of theempirical data set; and 5) using sampled samples by the agent for learning of the value function. The method is suitable for use in the game problem of the reinforcement learning field, and can quickly achieve a better result in a case of lesser sample quantity.

Description

Technical field [0001] The present invention belongs to the field of enhanced learning, and is a branch of the field of machine learning, and particularly relates to a sample sampling method based on empirical data state distribution perception. Background technique [0002] Sample selection is an important issue in the field of machine learning, and different selection methods directly affect the quality of model learning. In the field of reinforcement learning, sample sampling from empirical data sets can help overcome the problem of sample correlation and forgetting early samples. The goal of sample sampling is to select samples from the sample set that can accelerate the model convergence and enhance the agent's ability to perceive the environment. The traditional method generally adopts random and uniform sampling to sample from the empirical data set. This method is easy to cause sample imbalance, which makes the learning speed of the agent slow. [0003] The existing sampl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N99/00G06K9/62
CPCG06F18/2321G06F18/24
Inventor 李玺李伟超皇福献
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products