Automatic Control Method Based on Multi-objective Reinforcement Learning Algorithm Using Gradients

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A reinforcement learning and multi-objective technology, applied in the field of automatic control of multi-objective reinforcement learning algorithms, can solve the problems of low algorithm efficiency, slow convergence speed, failure to use gradient information, etc., and achieve high algorithm efficiency and accelerated convergence speed

Active Publication Date: 2021-04-23

TSINGHUA UNIV

View PDF10 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Multi-strategy methods are divided into the following categories: convex hull method, variable parameter method, and reinforcement learning algorithm combined with multi-objective optimization algorithm. Among them, the convex hull method uses the weighted summation of gradients, so it is difficult to obtain the non-convex area. Strategy; the variable parameter method is repeated execution but the strategy method, the algorithm efficiency is low; the reinforcement learning algorithm combined with the multi-objective optimization algorithm fails to use the known gradient information in the reinforcement learning algorithm, and the convergence speed is slow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0116] An automatic control method based on a gradient multi-objective reinforcement learning method of the present invention can be applied to the automatic control of unmanned vehicles, robots, unmanned aerial vehicles, and the like. In the embodiment, the application technology of end-to-end adaptive cruise in automatic driving is taken as an example, and the method of combining the deep neural network model and the reinforcement learning model is adopted to further illustrate the present invention.

[0117] The implementation of this method comprises the following steps:

[0118] Step 1. Construct a multi-objective reinforcement learning problem

[0119] Since the goal is to realize the adaptive cruise function of the vehicle through the end-to-end automatic control method, in this embodiment, the input (ie state) of the determination algorithm is the front road image and the vehicle speed captured by the vehicle camera, and the output of the algorithm (ie action ) is the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of automatic control, in particular to an automatic control method based on a gradient-based multi-objective reinforcement learning algorithm. The present invention uses the known gradient information in the reinforcement learning algorithm to update the function parameters. Compared with the single-strategy multi-objective reinforcement learning algorithm, this algorithm can obtain the Pareto frontier solution set, and can select different parameters according to different needs during actual control. Control strategy; compared with the convex hull method in the multi-strategy multi-objective reinforcement learning algorithm, this algorithm does not depend on the linear weighting of the target return, and can obtain the control strategy of the non-convex area in the frontier solution; compared with the multi-strategy multi-objective reinforcement The multi-parameter method in the learning algorithm, this algorithm can solve all Pareto frontier solutions at one time, and the algorithm efficiency is high; compared with the method of combining multi-objective optimization algorithm in the multi-strategy and multi-objective reinforcement learning algorithm, this algorithm uses Gradient information speeds up the convergence speed of the algorithm.

Description

technical field [0001] The invention relates to the technical field of automatic control, in particular to an automatic control method based on a gradient-based multi-objective reinforcement learning algorithm. Background technique [0002] In order to solve the continuous time-domain decision-making problem of robot automatic control, reinforcement learning is an effective tool, but in the actual process, many automatic control problems are multi-objective problems. It is difficult to find the correct mapping relationship between the target and the state, and learn the correct value function and strategy for the single-objective reinforcement learning algorithm, so multi-objective reinforcement learning is required. At present, multi-objective reinforcement learning is mainly divided into two categories, one is single-strategy methods, and the other is multi-strategy methods. The single-strategy method can only obtain a single strategy, and cannot obtain multiple Pareto fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G05B13/04

CPCG05B13/042

Inventor李克强王庭晗罗禹贡李升波刘金鑫王建强许庆高博麟

OwnerTSINGHUA UNIV

Automatic Control Method Based on Multi-objective Reinforcement Learning Algorithm Using Gradients

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology