Automatic discovery method of complex system oriented MAXQ task graph structure

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A complex system and automatic discovery technology, applied in instruments, electrical digital data processing, computers, etc., can solve the problems that subtasks cannot be further divided, and MAXQ has weak automatic layering ability, etc.

Active Publication Date: 2012-06-27

SOUTHEAST UNIV

View PDF2 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The MAXQ method can clearly express the hierarchical structure of the task with the task graph, and the method has strong online learning ability, but the MAXQ automatic layering ability is weak, and there are often subtasks whose state space is still large and cannot be further divided

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0053] The present invention will be described in detail below.

[0054] Assume that the interaction between the Agent and the environment occurs at a series of discrete moments t=0, 1, 2, . . . At each time t, the Agent obtains the state si∈S by observing the environment. Agent chooses the exploration action at ∈ A according to strategy π and executes it. At the next moment t+1, the Agent receives the reinforcement signal (reward value) rt+1∈R from the environment, and reaches a new state st+1∈S. According to the reinforcement signal rt+1, the Agent improves the strategy π. The ultimate goal of reinforcement learning is to find an optimal strategy The state value obtained by the Agent (that is, the total reward obtained by the state) V π (S) Maximum (or minimum), where γ is the remuneration discount factor. 0≤γ≤1. Due to the randomness of the state transition of the environment, under the action of policy π, the state s t value of: where P(s t+1 |s t , a t ) is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an automatic discovery method of a complex system oriented MAXQ task graph structure. The method comprises the following steps of: firstly searching state variables influencing actions by adopting a Q-learning exploration environment; then scheduling a clustering algorithm based on the action execution effect, wherein the policy uses a data object as one atomic class, and the atomic classes are aggregated; gradually aggregating the atomic classes to a crescent class until the end condition is met; the process of an aggregation algorithm is as follows: forming a single cluster by each member in the beginning, and combining the adjacent clusters into one cluster in the later iterative process until all the members form one cluster, wherein the time and space complexities are O (n2); combining two clusters through a clustering method, which cannot be separated to the prior state; and (3) obtaining a hierarchical task graph. The method establishes a clustering model by using all learned and perceived information, and automatically constructs the MAXQ task graph through clustering and finally realizes automatic hierarchy of the MAXQ.

Description

technical field [0001] The invention relates to an automatic layering method of layered reinforcement learning under large-scale task solving in a complex system by using a computer. technical background [0002] At present, no computer has been found to solve the MAXQ automatic stratification problem under large-scale tasks by combining clustering methods. Although there are some methods that can solve the hierarchical problem of hierarchical reinforcement learning, such as: bottleneck and landmark state method, shared subspace method, multidimensional state method and Markov space method, etc., these methods have certain connections with the present invention, namely Both are problems in the automatic layering domain of hierarchical reinforcement learning. But the specific solution is a completely different problem. Most of the previous methods are based on Option, or Q-learning and other methods, and my invention is based on the hierarchical reinforcement learning of the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F15/18

Inventor 王红兵李文雅

Owner SOUTHEAST UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Automatic discovery method of complex system oriented MAXQ task graph structure

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology