Multi-objective reinforcement learning method and device based on Pareto optimization

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A reinforcement learning and multi-objective technology, applied in the direction of specific mathematical models, machine learning, instruments, etc., to achieve fast convergence, good stability, and good performance

Pending Publication Date: 2022-07-12

NAT UNIV OF DEFENSE TECH

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] The main purpose of the present invention is to provide a multi-objective reinforcement learning method and device based on Pareto optimization, which aims to solve the problem that the existing technology cannot achieve fast convergence and relatively good stability in deep learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0051] It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0052] refer to figure 1 , figure 1 This is a schematic structural diagram of a multi-objective reinforcement learning device based on Pareto optimization of the hardware operating environment involved in the solution of the embodiment of the present invention.

[0053] like figure 1 As shown, the multi-objective reinforcement learning device based on Pareto optimization may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interfa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the field of reinforcement learning, and discloses a multi-target reinforcement learning method and device based on Pareto optimization, and the method comprises the steps: solving a multi-target reinforcement learning problem through employing a summarization mode, and calculating a Q value of each sub-target for each strategy; performing non-dominated sorting on the sub-target Q values by using a Pareto dominating theory to obtain a Pareto leading edge set; randomly selecting an action from the Pareto frontier set to interact with an environment; generating a multi-target DQN algorithm based on a Pareto frontier set, and training a target network by using the DQN algorithm to generate a strategy network; and updating the expectation of the Pareto frontier sub-target Q value according to the policy network, and approaching a set of all Pareto optimal deterministic policies by directly popularizing the deep Q network to multiple targets, thereby showing better performance, rapid convergence and relatively better stability, and providing more diversified solutions.

Description

technical field [0001] The present application relates to the field of reinforcement learning, in particular to a multi-objective reinforcement learning method and device based on Pareto optimization. Background technique [0002] Reinforcement learning (RL) is a framework that learns an optimal action policy for an agent based on task-related future rewards received from the agent's environment. While most RL algorithms are structured to achieve a specific goal, many real-world applications are inherently characterized by the presence of multiple, potentially conflicting goals, such as economic systems, healthcare, transportation, management, natural resources, mission planning and robot control. [0003] As a generalization of standard reinforcement learning methods, multi-objective reinforcement learning (MORL) addresses the need for trade-offs between competing objectives. In MORL, the reward signal is a vector for each agent, where each element represents a maximized ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06N20/00G06N7/00

CPCG06N20/00G06N7/01

Inventor冯旸赫阳方杰黄红蓝施伟马扬程光权黄金才刘忠

OwnerNAT UNIV OF DEFENSE TECH

Multi-objective reinforcement learning method and device based on Pareto optimization

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology