Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic discovery method of complex system oriented MAXQ task graph structure

A complex system and automatic discovery technology, applied in instruments, electrical digital data processing, computers, etc., can solve the problems that subtasks cannot be further divided, and MAXQ has weak automatic layering ability, etc.

Active Publication Date: 2012-06-27
SOUTHEAST UNIV
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The MAXQ method can clearly express the hierarchical structure of the task with the task graph, and the method has strong online learning ability, but the MAXQ automatic layering ability is weak, and there are often subtasks whose state space is still large and cannot be further divided

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic discovery method of complex system oriented MAXQ task graph structure
  • Automatic discovery method of complex system oriented MAXQ task graph structure
  • Automatic discovery method of complex system oriented MAXQ task graph structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The present invention will be described in detail below.

[0054] Assume that the interaction between the Agent and the environment occurs at a series of discrete moments t=0, 1, 2, . . . At each time t, the Agent obtains the state si∈S by observing the environment. Agent chooses the exploration action at ∈ A according to strategy π and executes it. At the next moment t+1, the Agent receives the reinforcement signal (reward value) rt+1∈R from the environment, and reaches a new state st+1∈S. According to the reinforcement signal rt+1, the Agent improves the strategy π. The ultimate goal of reinforcement learning is to find an optimal strategy The state value obtained by the Agent (that is, the total reward obtained by the state) V π (S) Maximum (or minimum), where γ is the remuneration discount factor. 0≤γ≤1. Due to the randomness of the state transition of the environment, under the action of policy π, the state s t value of: where P(s t+1 |s t , a t ) is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an automatic discovery method of a complex system oriented MAXQ task graph structure. The method comprises the following steps of: firstly searching state variables influencing actions by adopting a Q-learning exploration environment; then scheduling a clustering algorithm based on the action execution effect, wherein the policy uses a data object as one atomic class, and the atomic classes are aggregated; gradually aggregating the atomic classes to a crescent class until the end condition is met; the process of an aggregation algorithm is as follows: forming a single cluster by each member in the beginning, and combining the adjacent clusters into one cluster in the later iterative process until all the members form one cluster, wherein the time and space complexities are O (n2); combining two clusters through a clustering method, which cannot be separated to the prior state; and (3) obtaining a hierarchical task graph. The method establishes a clustering model by using all learned and perceived information, and automatically constructs the MAXQ task graph through clustering and finally realizes automatic hierarchy of the MAXQ.

Description

technical field [0001] The invention relates to an automatic layering method of layered reinforcement learning under large-scale task solving in a complex system by using a computer. technical background [0002] At present, no computer has been found to solve the MAXQ automatic stratification problem under large-scale tasks by combining clustering methods. Although there are some methods that can solve the hierarchical problem of hierarchical reinforcement learning, such as: bottleneck and landmark state method, shared subspace method, multidimensional state method and Markov space method, etc., these methods have certain connections with the present invention, namely Both are problems in the automatic layering domain of hierarchical reinforcement learning. But the specific solution is a completely different problem. Most of the previous methods are based on Option, or Q-learning and other methods, and my invention is based on the hierarchical reinforcement learning of the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F15/18
Inventor 王红兵李文雅
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products