Online learning control method of nonlinear discrete time system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A nonlinear discrete time system technology, applied in the field of online learning control, can solve problems such as easy to fall into local optimal solution, insufficient exploration of state-policy space, etc., to achieve online learning, good exploration ability, and overcome insufficient exploration

Pending Publication Date: 2021-10-08

INFORMATION SCI RES INST OF CETC

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although the direct heuristic dynamic programming algorithm can achieve online adaptive optimal control, the algorithm still has the following shortcomings: 1) The algorithm uses an on-policy learning mechanism, which has the problem of insufficient exploration of the state-policy space , easy to fall into the local optimal solution; 2) The hyperbolic tangent function is used in both the activation function of the execution network and the evaluation network, and all the current convergence and stability analysis results are based on the hyperbolic tangent function. For other types of The activation function is no longer applicable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0067] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

[0068] The present invention first needs to consider the optimal control problem of the nonlinear discrete-time system as follows. Consider the following discrete-time system:

[0069] x k+1 =F(x k ,u k ),x 0 =x(0)

[0070] where x k is the system state, u k Enter for the system. System function F(x k ,u k ) in the tight set is Lipschitz continuous and satisfies F(0,0)=0. Assume that the system is stable on Ω, that is, there exists a control sequence u 1 ,...,u k ,…, such that x k →0. In addition, sup...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an online learning control method of a nonlinear discrete time system. The method comprises a behavior strategy selection step, an optimal Q-function definition step, an evaluation network and execution network introduction step, an estimation error calculation step and a final optimal weight calculation step, wherein after the weights of an evaluation network and an execution network are converged, the output of the execution network is the approximate value of an optimal controller. According to the invention, repeated iteration between strategy evaluation and strategy improvement is not needed, and real-time online learning of the optimal controller can be realized; a deorbital strategy learning mechanism is adopted, so that the problem that a direct heuristic dynamic planning method is insufficient in state-strategy space exploration is effectively solved, and an execution network and an evaluation network can use activation functions in any form; and online learning of an optimal controller can be achieved, a system model is not needed, and only state data generated by the behavior strategy is needed.

Description

technical field [0001] The invention relates to the field of industrial production control, in particular to an online learning control method for nonlinear discrete time systems. Background technique [0002] In the process of industrial production, engineers and technicians often need to optimize the design of the controllers of the control objects such as robots, drones, and unmanned vehicles to meet certain control indicators. Because the above-mentioned control objects often exhibit strong nonlinearity, the optimization of the controller faces great difficulties. From the perspective of optimal control, obtaining the optimal control controller needs to solve the complex Hamilton-Jacobi-Bellman equation (HJB equation), but the HJB equation is a nonlinear partial differential equation, which is very difficult to solve. Traditional dynamic programming, variational methods, spectral methods, etc. often face great limitations in practical applications due to their extremely...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G05B13/02

CPCG05B13/027G05B13/021Y02P90/02

Inventor 李新兴查文中王雪源王蓉

Owner INFORMATION SCI RES INST OF CETC

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Online learning control method of nonlinear discrete time system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology