Unlock instant, AI-driven research and patent intelligence for your innovation.

Automatic Control Method Based on Multi-objective Reinforcement Learning Algorithm Using Gradients

A reinforcement learning and multi-objective technology, applied in the field of automatic control of multi-objective reinforcement learning algorithms, can solve the problems of low algorithm efficiency, slow convergence speed, failure to use gradient information, etc., and achieve high algorithm efficiency and accelerated convergence speed

Active Publication Date: 2021-04-23
TSINGHUA UNIV
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Multi-strategy methods are divided into the following categories: convex hull method, variable parameter method, and reinforcement learning algorithm combined with multi-objective optimization algorithm. Among them, the convex hull method uses the weighted summation of gradients, so it is difficult to obtain the non-convex area. Strategy; the variable parameter method is repeated execution but the strategy method, the algorithm efficiency is low; the reinforcement learning algorithm combined with the multi-objective optimization algorithm fails to use the known gradient information in the reinforcement learning algorithm, and the convergence speed is slow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic Control Method Based on Multi-objective Reinforcement Learning Algorithm Using Gradients
  • Automatic Control Method Based on Multi-objective Reinforcement Learning Algorithm Using Gradients
  • Automatic Control Method Based on Multi-objective Reinforcement Learning Algorithm Using Gradients

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0116] An automatic control method based on a gradient multi-objective reinforcement learning method of the present invention can be applied to the automatic control of unmanned vehicles, robots, unmanned aerial vehicles, and the like. In the embodiment, the application technology of end-to-end adaptive cruise in automatic driving is taken as an example, and the method of combining the deep neural network model and the reinforcement learning model is adopted to further illustrate the present invention.

[0117] The implementation of this method comprises the following steps:

[0118] Step 1. Construct a multi-objective reinforcement learning problem

[0119] Since the goal is to realize the adaptive cruise function of the vehicle through the end-to-end automatic control method, in this embodiment, the input (ie state) of the determination algorithm is the front road image and the vehicle speed captured by the vehicle camera, and the output of the algorithm (ie action ) is the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of automatic control, in particular to an automatic control method based on a gradient-based multi-objective reinforcement learning algorithm. The present invention uses the known gradient information in the reinforcement learning algorithm to update the function parameters. Compared with the single-strategy multi-objective reinforcement learning algorithm, this algorithm can obtain the Pareto frontier solution set, and can select different parameters according to different needs during actual control. Control strategy; compared with the convex hull method in the multi-strategy multi-objective reinforcement learning algorithm, this algorithm does not depend on the linear weighting of the target return, and can obtain the control strategy of the non-convex area in the frontier solution; compared with the multi-strategy multi-objective reinforcement The multi-parameter method in the learning algorithm, this algorithm can solve all Pareto frontier solutions at one time, and the algorithm efficiency is high; compared with the method of combining multi-objective optimization algorithm in the multi-strategy and multi-objective reinforcement learning algorithm, this algorithm uses Gradient information speeds up the convergence speed of the algorithm.

Description

technical field [0001] The invention relates to the technical field of automatic control, in particular to an automatic control method based on a gradient-based multi-objective reinforcement learning algorithm. Background technique [0002] In order to solve the continuous time-domain decision-making problem of robot automatic control, reinforcement learning is an effective tool, but in the actual process, many automatic control problems are multi-objective problems. It is difficult to find the correct mapping relationship between the target and the state, and learn the correct value function and strategy for the single-objective reinforcement learning algorithm, so multi-objective reinforcement learning is required. At present, multi-objective reinforcement learning is mainly divided into two categories, one is single-strategy methods, and the other is multi-strategy methods. The single-strategy method can only obtain a single strategy, and cannot obtain multiple Pareto fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G05B13/04
CPCG05B13/042
Inventor 李克强王庭晗罗禹贡李升波刘金鑫王建强许庆高博麟
Owner TSINGHUA UNIV