Online learning control method of nonlinear discrete time system

A nonlinear discrete time system technology, applied in the field of online learning control, can solve problems such as easy to fall into local optimal solution, insufficient exploration of state-policy space, etc., to achieve online learning, good exploration ability, and overcome insufficient exploration

Pending Publication Date: 2021-10-08
INFORMATION SCI RES INST OF CETC
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the direct heuristic dynamic programming algorithm can achieve online adaptive optimal control, the algorithm still has the following shortcomings: 1) The algorithm uses an on-policy learning mechanism, which has the problem of insufficient exploration of the state-policy space , easy to fall into the loca...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Online learning control method of nonlinear discrete time system
  • Online learning control method of nonlinear discrete time system
  • Online learning control method of nonlinear discrete time system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

[0068] The present invention first needs to consider the optimal control problem of the nonlinear discrete-time system as follows. Consider the following discrete-time system:

[0069] x k+1 =F(x k ,u k ),x 0 =x(0)

[0070] where x k is the system state, u k Enter for the system. System function F(x k ,u k ) in the tight set is Lipschitz continuous and satisfies F(0,0)=0. Assume that the system is stable on Ω, that is, there exists a control sequence u 1 ,...,u k ,…, such that x k →0. In addition, sup...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an online learning control method of a nonlinear discrete time system. The method comprises a behavior strategy selection step, an optimal Q-function definition step, an evaluation network and execution network introduction step, an estimation error calculation step and a final optimal weight calculation step, wherein after the weights of an evaluation network and an execution network are converged, the output of the execution network is the approximate value of an optimal controller. According to the invention, repeated iteration between strategy evaluation and strategy improvement is not needed, and real-time online learning of the optimal controller can be realized; a deorbital strategy learning mechanism is adopted, so that the problem that a direct heuristic dynamic planning method is insufficient in state-strategy space exploration is effectively solved, and an execution network and an evaluation network can use activation functions in any form; and online learning of an optimal controller can be achieved, a system model is not needed, and only state data generated by the behavior strategy is needed.

Description

technical field [0001] The invention relates to the field of industrial production control, in particular to an online learning control method for nonlinear discrete time systems. Background technique [0002] In the process of industrial production, engineers and technicians often need to optimize the design of the controllers of the control objects such as robots, drones, and unmanned vehicles to meet certain control indicators. Because the above-mentioned control objects often exhibit strong nonlinearity, the optimization of the controller faces great difficulties. From the perspective of optimal control, obtaining the optimal control controller needs to solve the complex Hamilton-Jacobi-Bellman equation (HJB equation), but the HJB equation is a nonlinear partial differential equation, which is very difficult to solve. Traditional dynamic programming, variational methods, spectral methods, etc. often face great limitations in practical applications due to their extremely...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G05B13/02
CPCG05B13/027G05B13/021Y02P90/02
Inventor 李新兴查文中王雪源王蓉
Owner INFORMATION SCI RES INST OF CETC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products