Multi-objective reinforcement learning method and device based on Pareto optimization

A reinforcement learning and multi-objective technology, applied in the direction of specific mathematical models, machine learning, instruments, etc., to achieve fast convergence, good stability, and good performance

Pending Publication Date: 2022-07-12
NAT UNIV OF DEFENSE TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The main purpose of the present invention is to provide a multi-objective reinforcement learning method and device based on Pareto optimization, which aims to solve the problem that the existing technology cannot achieve fast convergence and relatively good stability in deep learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-objective reinforcement learning method and device based on Pareto optimization
  • Multi-objective reinforcement learning method and device based on Pareto optimization
  • Multi-objective reinforcement learning method and device based on Pareto optimization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0052] refer to figure 1 , figure 1 This is a schematic structural diagram of a multi-objective reinforcement learning device based on Pareto optimization of the hardware operating environment involved in the solution of the embodiment of the present invention.

[0053] like figure 1 As shown, the multi-objective reinforcement learning device based on Pareto optimization may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interfa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of reinforcement learning, and discloses a multi-target reinforcement learning method and device based on Pareto optimization, and the method comprises the steps: solving a multi-target reinforcement learning problem through employing a summarization mode, and calculating a Q value of each sub-target for each strategy; performing non-dominated sorting on the sub-target Q values by using a Pareto dominating theory to obtain a Pareto leading edge set; randomly selecting an action from the Pareto frontier set to interact with an environment; generating a multi-target DQN algorithm based on a Pareto frontier set, and training a target network by using the DQN algorithm to generate a strategy network; and updating the expectation of the Pareto frontier sub-target Q value according to the policy network, and approaching a set of all Pareto optimal deterministic policies by directly popularizing the deep Q network to multiple targets, thereby showing better performance, rapid convergence and relatively better stability, and providing more diversified solutions.

Description

technical field [0001] The present application relates to the field of reinforcement learning, in particular to a multi-objective reinforcement learning method and device based on Pareto optimization. Background technique [0002] Reinforcement learning (RL) is a framework that learns an optimal action policy for an agent based on task-related future rewards received from the agent's environment. While most RL algorithms are structured to achieve a specific goal, many real-world applications are inherently characterized by the presence of multiple, potentially conflicting goals, such as economic systems, healthcare, transportation, management, natural resources, mission planning and robot control. [0003] As a generalization of standard reinforcement learning methods, multi-objective reinforcement learning (MORL) addresses the need for trade-offs between competing objectives. In MORL, the reward signal is a vector for each agent, where each element represents a maximized ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00G06N7/00
CPCG06N20/00G06N7/01
Inventor 冯旸赫阳方杰黄红蓝施伟马扬程光权黄金才刘忠
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products