Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Self-adapting random multi-arm decision-problem calculation method and device thereof

Inactive Publication Date: 2017-06-23
SUZHOU UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Soft maximization only grades the selection probability of each action based on the estimated value of the currently known action. If an action randomly gets a very low reward in the early stage, the selection probability of the action will be very low, which will lead to the The action may not be selected later, but the action may have a high reward in the later stage, so the cumulative reward obtained at the end is not the best
It can be seen that the decision-making effect of ε-greedy and soft maximization is not ideal
[0006] However, the confidence upper bound action selection method makes full use of the estimated value of the action and the number of times it is selected, and directly calculates the action to be selected each time based on the existing information, and its computational burden is relatively large.
However, the confidence upper bound action selection method must select all actions in turn at the beginning of the experiment, so when the number of experiments is less than or equal to the number of actions, the confidence upper bound action selection method will not be applicable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-adapting random multi-arm decision-problem calculation method and device thereof
  • Self-adapting random multi-arm decision-problem calculation method and device thereof
  • Self-adapting random multi-arm decision-problem calculation method and device thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The core of the present invention is to provide an adaptive stochastic multi-arm decision-making problem calculation method and its device, which can balance the selection of exploration and utilization, ensure the final decision-making effect, and have a wide range of applications.

[0031] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0032] The present invention provides an adaptive r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a self-adapting random multi-arm decision-problem calculation method and a device thereof. The method includes the steps that: an estimated value of each motion and the number of selection times of each motion are initialized; according to the estimated value of each motion and the number of the selection times of each motion, the number m of the selection times of the motion which has the smallest estimated value is determined; exploration is conducted according to the probability w / (w+m<2>), and utilization is conducted according to the probability 1-w / (w+m<2>); w is a preset algorithm parameter, and the exploration operation means that from the motions which currently have the smallest number of the selection times, a motion is randomly selected as a motion of the next time step, and the utilization operation means that the motion which currently has the greatest estimated value is selected as a motion of the next time step; after the motion selection of the next time step is completed, a random award is generated; according to the random award and the selected motions, the estimated value of each motion and the number of selection times of each motion are updated, the m is determined again, until all the motions are completed, the sum of the obtained random awards in a preset greatest time step is counted, and an accumulative award is obtained. The method and the device can balance the exploration selection and the utilization selection and ensure final decision-making effects, and are wide in application range.

Description

technical field [0001] The invention relates to the field of stochastic multi-way selection learning optimization, in particular to an adaptive stochastic multi-arm decision-making problem calculation method and a device thereof. Background technique [0002] The multi-arm decision-making problem is a classic problem in researching the balance between exploration and utilization in reinforcement learning. It was first used in diagnosis and treatment trials. In recent years, the application of multi-arm decision-making has become more and more extensive, and can be applied to recommender systems, crowdsourcing, and smart grids. [0003] Random multi-arm decision-making problem is a classic multi-arm decision-making problem, and it is the basis of many new multi-arm decision-making problems. A random multi-arm decision-making problem includes K arms, one arm corresponds to an action, and an action is selected at each time step. After each action is selected, a random reward w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F7/50G06F7/58
CPCG06F7/50G06F7/582
Inventor 周倩章晓芳章鹏
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products