A high-performance anti-interference method based on deep reinforcement learning interference estimator

By designing an adaptive filter based on a disturbance estimator using deep reinforcement learning, the problems of disturbance and noise effects in machining are solved, and high-precision machining control is achieved.

CN116880190BActive Publication Date: 2026-06-19ZHEJIANG UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV OF TECH
Filing Date
2023-07-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

During machining, disturbances such as load inertia changes, tool wear, uncertain external forces, and model uncertainties affect machining accuracy, and measurement noise reduces control performance. Existing technologies are unable to effectively suppress disturbances and weaken the impact of noise.

Method used

An interference estimator based on deep reinforcement learning is adopted, and an adaptive filter structure is designed. The filter gain is automatically learned under time-varying disturbance and uncertain measurement noise environment through deep reinforcement learning method, so as to quickly reconstruct the disturbance signal and suppress noise amplification, thereby improving the overall performance of the motion control system.

Benefits of technology

It enables rapid estimation of disturbances and effective suppression of noise during machining, improves the overall performance of the control system, and achieves high-precision machining.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116880190B_ABST
    Figure CN116880190B_ABST
Patent Text Reader

Abstract

This invention belongs to the field of anti-interference technology, specifically relating to a high-performance anti-interference method based on a deep reinforcement learning-based interference estimator. The controller includes: an input signal internal model, a state feedback controller, an equivalent input interference estimator, and a state observer. The equivalent input interference estimator is used to estimate the total disturbance of the control system. By adding compensation for the disturbance estimate to the system control input, the influence of the total system disturbance can be effectively and actively suppressed. The filter uses a deep Q-network to learn and adjust its gain, which can change the system control performance and thus adaptively adjust the disturbance estimation and noise attenuation capabilities. This invention automatically learns the filter gain under time-varying disturbances and uncertain measurement noise environments using deep reinforcement learning. It can quickly reconstruct the disturbance signal for abrupt disturbances and effectively suppress noise amplification under slowly varying disturbances, thereby effectively improving the overall performance of the motion control system and achieving high-precision control in machining processes.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of anti-interference technology, specifically relating to a high-performance anti-interference method based on a deep reinforcement learning interference estimator. Background Technology

[0002] In the machining processes of various automated equipment such as CNC machine tools, industrial robots, cutting machines, and engraving machines, machining accuracy is affected by disturbances such as changes in load inertia, tool wear, uncertain external forces, and model uncertainties. Anti-interference technology is one of the key technologies in machining; effectively suppressing disturbances is crucial for improving control accuracy. On the other hand, the impact of measurement noise on machining performance is equally significant; excessive noise amplification will reduce control performance. Therefore, how to suppress disturbances while mitigating the impact of noise is key to machining, and has important practical significance for improving machining accuracy and system performance, thereby meeting the precision requirements of high-end manufacturing.

[0003] An effective solution is to treat the effects of load inertia changes and external disturbances in the control system as the total system disturbance and to estimate and compensate for them using disturbance estimation techniques. Over the past few decades, several observer-based active disturbance suppression techniques have been proposed, such as disturbance observers, unknown input observers, extended state observers, and the Equivalent-Input-Disturbance (EID) method. Among these, the EID method, with its simple design and lack of need for an inverse dynamics model of the controlled object or a disturbance information model, has been successfully applied to various disturbance suppression methods.

[0004] It is worth noting that when measurement noise exists in the system, the disturbance estimation performance based on the observer will inevitably be affected by the noise. For closed-loop control systems based on equivalent input disturbance estimators, disturbance suppression performance can be improved by adjusting the observer gain; a high observer gain can achieve high-precision disturbance suppression performance, but it will also excessively amplify noise. Therefore, disturbance suppression and noise attenuation are a contradictory problem, and a trade-off must be made based on different control requirements. Disturbances can be classified into abrupt disturbances and slowly varying disturbances according to their different forms. For abrupt disturbances, it is necessary to quickly estimate and compensate for the disturbance; for slowly varying disturbances, noise amplification should be avoided as much as possible while suppressing the disturbance.

[0005] Therefore, considering the disturbances and measurement noise in the control system of automated equipment, it is particularly important to design a high-performance adaptive anti-interference method based on a deep reinforcement learning disturbance estimator to improve machining accuracy. Summary of the Invention

[0006] To reduce the impact of factors such as load inertia changes and external disturbances on motion control performance during machining, and considering the influence of measurement noise on disturbance estimation accuracy and system performance, this application proposes a high-performance anti-interference method based on a deep reinforcement learning disturbance estimator, using an equivalent input disturbance estimator. An adaptive filter structure based on deep reinforcement learning is designed, which automatically learns the filter gain under time-varying disturbances and uncertain measurement noise environments. This allows for rapid reconstruction of disturbance signals during abrupt disturbances and effective suppression of noise amplification during slowly varying disturbances, thereby significantly improving the overall performance of the motion control system and achieving high-precision control during machining.

[0007] To achieve the above objectives, the technical solution adopted by this invention is: a high-performance anti-interference method based on a deep reinforcement learning interference estimator, applied to a motion control system, wherein the high-performance anti-interference method based on a deep reinforcement learning interference estimator includes:

[0008] Establish an equivalent input disturbance state-space model for the motion control system;

[0009] Design a state observer to obtain the state observation value based on the system output y(k) of the motion control system and the state observer gain L of the state observer.

[0010] Design an internal model system, and establish a state feedback controller for the motion control system based on the equivalent input disturbance state space model of the motion control system and the internal model system. The state feedback controller is based on the state x of the internal model system. I (k) and state observations Obtain the state feedback output u f (k), expressed by the formula as follows;

[0011]

[0012] Where k is the sampling time, and k = 1, 2, 3… are positive integers, K I Let x be the state of the internal model system. I The feedback gain of (k), K p State observations Feedback gain;

[0013] Design a deep reinforcement learning interference estimator, which includes an equivalent input interference estimator and a filter F(z);

[0014] The equivalent input interference estimator outputs u based on the observer gain L and the state feedback. f The summation disturbance estimate is obtained by combining u(k) with the system control input u(k).

[0015] The filter gain of the filter is adjusted based on deep reinforcement learning;

[0016] The final output of the deep reinforcement learning interference estimator is This represents the estimate of the total disturbance. Filtered disturbance estimate The formula is expressed as follows:

[0017]

[0018]

[0019]

[0020] In the formula, Z[] and Z -1 [] represents the Z-transform and the inverse Z-transform, respectively. and They represent the disturbance estimates respectively. Sum of total disturbance estimates Z-transform;

[0021] Based on disturbance estimate In the state feedback control output u f By adding a negative compensation to (k), the system control input with disturbance compensation is represented as follows:

[0022]

[0023] Where u(k) represents the system control input of the motion control system.

[0024] Furthermore, establishing the equivalent input disturbance state-space model of the motion control system includes:

[0025] The motion control system is represented as:

[0026]

[0027] In the formula, x(k) = [x1 x2] represents the system state of the motion control system at time k, x1 and x2 are the system position and velocity, respectively, x(k+1) represents the system state of the motion control system at time k+1, u(k) is the system control input of the motion control system, y(k) is the system output of the motion control system, v(k) is the sensor measurement noise, A, B, C are system matrices with the same dimension as the system order, d(k) represents the external disturbance, B d The gain matrix corresponding to the external disturbance is represented by formula (6), which satisfies the constraint condition that the motion control system composed of (A,B,C) has observability and controllability.

[0028] Introducing the concept of equivalent input interference, we define d e (k) represents the equivalent input disturbance of the motion control system, i.e., d e The effect of (k) on the system output is equivalent to the effect of the external disturbance d(k) on the system output y(k). Therefore, the motion control system can be rewritten as follows:

[0029]

[0030] The equivalent input disturbance state-space model of the motion control system is obtained.

[0031] Furthermore, the design state observer includes:

[0032] The state observer is designed as follows:

[0033]

[0034] In the formula, Let A, B, and C represent the state observations at time k+1, and let A, B, and C be system matrices of the same dimension as the system order. Let L be the observed value of the system output y(k), and L be the state observer gain.

[0035] Furthermore, the filter is expressed by the following formula:

[0036]

[0037] In the formula, z is the Z-transform operator, e is the exponential function, T is the sampling period of the motion control system, and ω a φ is the cutoff angular frequency. a This represents the filter gain.

[0038] Furthermore, adjusting the filter gain based on deep reinforcement learning includes:

[0039] A deep Q-network is used to learn the filter gain under given interference and random sensor measurement noise conditions.

[0040] The state space, action space, and reward function of the deep Q-network are designed as follows:

[0041] s(k)=[y(k)-r(k),x(k),φ a (k)]

[0042] E = {-e l ,0,e u},e∈(0,1)

[0043] φ a (k+1)=φ a(k)+a(k),a(k∈E

[0044] φ a (k)∈(φ min ,φ max )

[0045] r e =-β×abs(y k -r k )+(φ a (k)-φ min )

[0046] In the formula, s(k) is the state space, r(k) is the input signal, y(k) is the system output of the motion control system, y(k)-r(k) is the trajectory tracking error, and φ a (k) represents the filter gain in the k-th control cycle, φ a (k+1) represents the filter gain in the (k+1)th control cycle, E is the action space, and a(k) represents the action, including the gain φ. a (k) Deceleration, holding, and acceleration actions; e l Indicates a decrease in value, e u Indicates the increase value, φ max φ min Represents φ a The upper and lower bounds of (k), r e Here, abs represents the reward function, β is the absolute value function, and β is the weight parameter, set to a positive constant. k -r k ) represents the absolute value of the output error, φ a (k)-φmin represents the correlation noise suppression index.

[0047] Furthermore, the filter F(Z) employs a deep Q-network to learn and adjust the filter gain φ. a The steps include:

[0048] Step 1: Initialize network Q with random network parameters ω ω (s(k),a(k)), copy the same parameter ω - ←ω to initialize the target network Initialize the experience replay pool R, select the discount factor γ, and the exploration probability ε;

[0049] Step 2: Select a state from the state space;

[0050] Step 3: Randomly generate the threshold R a ∈

[01] , if R a ≤ε, select the action sequence number argmaxQ ωIf (s(k), a(k)), execute action a(k); otherwise, randomly select an action number and execute action a(k).

[0051] Step 4: Adjust the filter gain φ a (k+1)=φ a (k)+a(k);

[0052] Step 5: Calculate the reward r e =-β×abs(y k -r k )+(φ a (k)-φ min The state changes to s(k+1);

[0053] Step 6: Store {s(k),a(k),r(k),s(k+1)} in the experience pool R;

[0054] Step 7: If the number of data in R reaches the threshold, select M data points {s(i), a(i), r(i), s(i+1)}. i=1,...,M For each data point, compute a temporary term using the target network. Subsequently, minimize the target loss function Update current network Q ω ;

[0055] Step 8: After sampling at intervals of m, copy the same parameter ω. - ←ω to update the target network

[0056] Furthermore, the internal mold system is expressed by the following formula:

[0057] x I (k+1)=A I x I (k)+B I [r(k)-y(k)]

[0058] In the formula, x I (k+1) represents the state of the internal model system at time k+1, A I and B I It is a system matrix with the same dimensions as the internal model system.

[0059] Compared with the prior art, the beneficial effects of the present invention are as follows: by adding compensation for disturbance estimation to the system control input u(k), the influence of total system disturbance can be effectively and actively suppressed, and the disturbance estimation value... It incorporates a comprehensive performance trade-off between system disturbances and noise effects, ultimately achieving high-precision tracking control of automated equipment. Attached Figure Description

[0060] Figure 1 This is a framework diagram of the reinforcement learning-based equivalent input interference estimator in this invention;

[0061] Figure 2 A schematic diagram of the total system disturbance and measurement noise added to this invention;

[0062] Figure 3 This is a graph showing the reinforcement learning iteration curve of this invention;

[0063] Figure 4 The filter gain φ of this invention a Change diagram;

[0064] Figure 5 This is a comparison chart of the disturbance estimation in this invention;

[0065] Figure 6 This is a comparison chart of the output errors of this invention. Detailed Implementation

[0066] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0067] In one embodiment, such as Figure 1 As shown, a high-performance anti-interference system based on a deep reinforcement learning interference estimator is proposed for trajectory tracking control in machining processes. The system includes: an input signal internal model, a state feedback controller, an equivalent input interference estimator, and a state observer. The input signal internal model is used to process the input signal, and its processing is represented by the following formula:

[0068] x I (k+1)=A I x I (k)+B I [r(k)-y(k)] (1)

[0069] In the formula, k is the sampling time, and k = 1, 2, 3… are positive integers, x I (k) represents the internal model state of the system, r(k) is the input signal, y(k) is the system output, and A I and B I The internal model matrix;

[0070] The state feedback controller uses the pole placement method to calculate the state feedback gain, and then calculates the state feedback output u. f (k) is expressed by the following formula:

[0071]

[0072] In the formula, K I K p Represents the state feedback gain. The observed value represents the system state x(k).

[0073] The state observer is used for observing and estimating the system state of the motion control system, and is expressed by the following formula:

[0074]

[0075] In the formula, Let x(k) represent the observed values. The observed value of the system output y(k) is represented by L, which represents the state observer gain, and u is the value of the observed value of the system output y(k). f (k) represents the state feedback output, and A, B, and C are the system matrices.

[0076] The equivalent input disturbance estimator is used to estimate the total disturbance composed of various external disturbances, and obtain the total disturbance estimate. The formula is as follows:

[0077]

[0078] In the formula, B represents the observed value of the system state x(k). + Let B represent the Moore-Penrose generalized inverse matrix. + = (B T B) -1 B T u(k) is the control input, u f (k) represents the state feedback output. v(k) represents the observation error, and v(k) represents the system's measurement noise.

[0079] The final output of the equivalent input interference estimator is: Indicates the total disturbance The filtered disturbance estimate, where the filter is denoted as F(z); then the filtered disturbance estimate is... The formula is expressed as follows:

[0080]

[0081]

[0082]

[0083] In the formula, Z[] and Z -1 [] represents the Z-transform and the inverse Z-transform, respectively. and They represent and Z-transform.

[0084] The filter F(z) is designed with an intelligent learning mechanism based on deep reinforcement learning, which can adaptively adjust the filter bandwidth according to the perturbation characteristics and measurement noise, as expressed by the following formula:

[0085]

[0086] In the formula, z is the Z-transform operator, e is the exponential function, T is the sampling period of the system, and ω a φ is the cutoff angular frequency. a Adjust φ to adjust the filter gain. a It can change the system's control performance to adaptively adjust disturbance estimation and noise attenuation capabilities.

[0087] Furthermore, a deep Q-network is used to learn and adjust the gain φ of the filter F(Z). a The state space, action space, and reward function of this method are designed as follows:

[0088] s(k)=[y(k)-r(k),x(k),φ a (k)] (9)

[0089]

[0090] r e =-β×abs(y k -r k )+(φ a (k)-φ min (11)

[0091] s(k) represents the state space, r(k) represents the input signal, y(k) represents the system output of the motion control system, y(k)-r(k) represents the trajectory tracking error, and φ a (k) represents the filter gain in the k-th control cycle; E is the action space, and a(k) represents the action, including the gain φ. a (k) Deceleration, holding, and acceleration actions; e l Indicates a decrease in value, e u Indicates the increase value, φ max φ min Represents φ a The upper and lower bounds of (k), φ max It can generally be set to 1. When there is a sudden disturbance in the system, φ can be reduced by deceleration. a (k), thereby improving the accuracy of disturbance estimation and increasing the stability margin of the closed-loop system; conversely, when the system focuses more on noise suppression performance, acceleration can increase φ. a(k) This eliminates measurement noise. When the system control performance is good, the holding action can be selected to maintain φ. a (k) remains unchanged. r e Here, abs represents the reward function, β is the weight parameter, and it is set to a positive constant; the first term abs(y k -r k The first term, φ, represents the absolute value of the output error, which directly reflects the disturbance suppression effect. A smaller value indicates higher disturbance reconstruction accuracy. a (k)-φmin represents the correlation noise suppression index.

[0092] The filter F(Z) uses a deep Q-network to learn and adjust the filter gain φ. a The steps are as follows:

[0093]

[0094] In summary, the system control input u(k) with disturbance compensation and noise suppression terms can be expressed as follows:

[0095]

[0096] Therefore, by adding compensation for the disturbance estimate to the system control input u(k), the influence of the total system disturbance can be effectively suppressed, and the disturbance estimate value... It incorporates a comprehensive performance trade-off between system disturbances and noise effects, ultimately achieving high-precision tracking control of automated equipment.

[0097] In another embodiment, a high-performance anti-interference system based on a deep reinforcement learning interference estimator runs a high-performance anti-interference method based on a deep reinforcement learning interference estimator, the specific steps of which are as follows:

[0098] Step S1: Establish the equivalent input disturbance state space model of the motion control system.

[0099] First, the motion control system is represented as:

[0100]

[0101] In the formula, x(k) = [x1 x2] represents the system state described in formula (13), x1 and x2 are the system position and velocity, respectively, u(k) is the system control input, y(k) is the system output, v(k) is the sensor measurement noise, A, B, and C are system matrices with the same dimension as the system order, d(k) represents the external disturbance caused by load inertia changes, tool wear, uncertain external forces, and model uncertainties, etc., B d Let represent the gain matrix corresponding to the external disturbance. Equation (13) satisfies the condition that the system consisting of (A,B,C) has observability and controllability.

[0102] Secondly, the concept of equivalent input interference is introduced, and d is defined. e (k) is the equivalent input disturbance of formula (13), i.e., d e The effect of (k) on the system output is equivalent to the effect of the external disturbance d(k) on the system output y(k). Therefore, the equivalent motion control system can be obtained as follows:

[0103]

[0104] Step S2: Design a state observer to achieve steady-state estimation of the system.

[0105] The state observer is designed as follows:

[0106]

[0107] In the formula, The observed values ​​represent the system state, and y(k) represents the system output. u is the observed value output by the system. f (k) represents the state feedback output, and L represents the state observer gain.

[0108] Subsequently, the observer gain L is designed using pole placement to achieve observer stabilization design.

[0109] Step S3: Deep reinforcement learning perturbation estimator, including equivalent input perturbation estimator and filter, to achieve accurate perturbation estimation.

[0110] The equivalent input disturbance estimator is used to estimate the total disturbance composed of various external disturbances, and obtain the total disturbance estimate. The formula is as follows:

[0111]

[0112] In the formula, B + Let B represent the Moore-Penrose generalized inverse matrix. + = (B T B) -1 B T u(k) is the system control input. For observation error, Let x(k) represent the observed value of state x(k), v(k) be the measurement noise, and u be the value of the observed value of state x(k). f (k) represents the state feedback output.

[0113] As can be seen from the analysis, the formula (16) calculates the current... We need to utilize the control input u(k) at the current moment, and the calculation of the control input u(k) is related to... There is a correlation, indicating a causal relationship. Therefore, a filter F(z) of the following form is designed, using the formula:

[0114]

[0115] In the formula, T is the sampling period of the system, e is an exponential function, and ω a φ is the cutoff angular frequency. a Adjust φ to adjust the filter gain. a It can change the system's control performance to adaptively adjust disturbance estimation and noise attenuation capabilities.

[0116]

[0117]

[0118]

[0119] In the formula, Z[] and Z -1 [] represents the Z-transform and the inverse Z-transform, respectively. This represents the filtered disturbance estimate. According to formula (18), and letting F(z)≈1, we have:

[0120]

[0121] Then, it can make

[0122]

[0123] Thus, accurate estimation of the total disturbance signal is achieved.

[0124] Step S4: Design a filter gain adjustment mechanism based on deep reinforcement learning to effectively eliminate noise effects while achieving high-precision perturbation estimation.

[0125] Furthermore, to balance the issues of disturbance estimation and noise attenuation, φ can be adjusted. a This allows for different focuses on perturbation suppression and noise attenuation. Therefore, a deep Q-network is employed to address both given perturbations and random measurement noise (such as...). Figure 2 φ under the environment shown) a The learning and interference d(t) can be expressed by the following formula:

[0126]

[0127] In the formula, N represents a natural number, i = 1, 2, ..., N, d(t) is composed of the sum of two sets of interference signals, namely d1(t) and d2(t), A d1 A d2The amplitude is represented by t, and the system time is represented by t; then d(k) is the discrete sampled signal of the interference d(t); the measurement noise v(k) = 150random[-11]×10 -6 , random[-11] represents a random number between [-1 1].

[0128] Furthermore, the state space, action space, and reward function of the deep Q-network are designed as follows:

[0129] s(k)=[y(k)-r(k),x(k),φ a (k)] (24)

[0130]

[0131] r e =-β×abs(y k -r k )+(φ a (k)-φ min (26)

[0132] s(k) represents the state space, r(k) represents the input signal, y(k) represents the system output of the motion control system, y(k)-r(k) represents the trajectory tracking error, and φ a (k) represents the filter gain in the k-th control cycle; E is the action space, and a(k) represents the action, including the gain φ. a (k) Deceleration, holding, and acceleration actions; e l Indicates a decrease in value, e u Indicates the increase value, φ max φ min Represents φ a The upper and lower bounds of (k), φ max It can generally be set to 1. When there is a sudden disturbance in the system, φ can be reduced by deceleration. a (k), thereby improving the accuracy of disturbance estimation and increasing the stability margin of the closed-loop system; conversely, when the system focuses more on noise suppression performance, acceleration can increase φ. a (k) This eliminates measurement noise. When the system control performance is good, the holding action can be selected to maintain φ. a (k) remains unchanged. r e Here, abs represents the reward function, β is the weight parameter, and it is set to a positive constant; the first term abs(y k -r k The first term, φ, represents the absolute value of the output error, which directly reflects the disturbance suppression effect. A smaller value indicates higher disturbance reconstruction accuracy. a (k)-φmin is the correlation noise suppression index.

[0133] The training process is as follows:

[0134]

[0135]

[0136] Through the above steps, the filter can automatically balance disturbance suppression and noise attenuation, and use the filter gain-adjusted based on the sum of disturbance estimates. Output disturbance estimate The disturbance estimates for each control cycle can be calculated using equations (16)-(20).

[0137] Step S5: Design a state feedback controller to achieve stable tracking control of the system.

[0138] Given a reference input r(k), the trajectory tracking accuracy is improved by using an internal model of the input signal. The internal model system is designed as follows:

[0139] x I (k+1)=A I x I (k)+B I [r(k)-y(k)] (27)

[0140] In the formula, x I (k) represents the internal model system state, y(k) represents the system output, and A I and B I The system matrix has the same dimensions as the internal model system order;

[0141] Combining equations (14) and (27), the specific form of state feedback control for the motion control system can be obtained as follows:

[0142]

[0143] Design the gain K of the state feedback controller using the pole placement method I and K p K I For the internal model system state x I The feedback gain of (k), K p State observer system state The feedback gain then the state feedback output u f (k) can be represented as:

[0144]

[0145] Step S6: Design a control input with disturbance compensation to implement a robust control strategy based on deep reinforcement learning to resist disturbances.

[0146] Based on perturbation estimation In the state feedback control output u f By adding a negative compensation to (k), the control input u(k) of the motion control system with disturbance compensation is obtained as follows:

[0147]

[0148] Therefore, based on the proposed intelligent anti-interference method using a deep reinforcement learning-based interference estimator, high-precision tracking control is achieved through effective compensation of the total system disturbance and effective suppression of measurement noise.

[0149] The effectiveness and superiority of the method are verified through a case study of tracking a straight line trajectory.

[0150] Motion control system matrix B = [04.41] T C =

[10] , the total disturbance and measurement noise of the control system are as follows Figure 2 As shown. According to equation (29), the internal model of the motion control system is set as follows. B I =

[01] T Furthermore, the control period h = 10 ms, the equivalent input disturbance estimation parameters are configured as L = [1.32 27.28], the observer poles are [-200 -200], and the filter parameters are ω. a =100, state feedback gain K = [244.482.50]. Configure improved filter reinforcement learning action adjustment parameter e. l =e u =0.1, minimum filter gain φ min =0.4, maximum value φ max =1, reward weight β=400, deep Q-network experience pool R=10000, exploration rate ε=0.9, discount factor γ=0.9, memory retrieval N=64 data per cycle, exchange frequency m=1000, learning time 8 seconds, learning once every 4 control cycles. The effectiveness and superiority of the algorithm are verified by using an "equivalent input disruptor". The mean curve of the reward function after 10 iterations is shown below. Figure 3 As shown, the horizontal axis represents the number of iterations, and the vertical axis represents the cumulative reward. Figure 3 This demonstrates the convergence of the proposed algorithm. The results of the adaptive filter gain adjustment are as follows: Figure 4 As shown, the gain can be adaptively adjusted according to the fast and slow variation characteristics of the total perturbation; the perturbation estimation comparison diagram is shown below. Figure 5 As shown in the figure, the output error comparison chart is as follows: Figure 6 As shown. By Figure 5 and Figure 6It is evident that, under both control strategies, the motion control system simultaneously subjected to external disturbances can effectively suppress disturbances and attenuate noise, exhibiting high control accuracy. Comparative analysis demonstrates that the "High-Performance Anti-Interference Method Based on Deep Reinforcement Learning Disturbance Estimator" of this invention, compared to the traditional "Control Method Based on Equivalent Input Disturbance Estimator," demonstrates better disturbance suppression during abrupt disturbances and excellent noise suppression during slowly varying disturbances. In practical applications, the control effect can be adjusted according to different emphases, thereby effectively improving the trajectory tracking control performance of the motion control system and achieving high-precision tracking control.

[0151] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

1. A high-performance anti-interference method based on a deep reinforcement learning interference estimator, applied to a motion control system, characterized in that, The high-performance anti-interference method based on a deep reinforcement learning interference estimator includes: Establish an equivalent input disturbance state-space model for the motion control system; Design a state observer based on the system output of the motion control system. and the state observer gain of the state observer Obtain state observations ; Design an internal model system, and establish a state feedback controller for the motion control system based on the equivalent input disturbance state space model of the motion control system and the internal model system. The state feedback controller is based on the state of the internal model system. and state observations Obtain state feedback output The formula is expressed as follows; in, The sampling time, and =1,2,3… are positive integers. The state of the internal model system Feedback gain, State observations Feedback gain; Design a deep reinforcement learning interference estimator, which includes an equivalent input interference estimator and a filter. ; The equivalent input disturbance estimator obtains a sum disturbance estimate from an observer gain L, a state feedback output and a system control input ;​ The filter gain of the filter is adjusted based on deep reinforcement learning; The final output of the deep reinforcement learning interference estimator is , representing the estimate of the total disturbance. Filtered disturbance estimate The formula is expressed as follows: In the formula, and These represent the Z-transform and the inverse Z-transform, respectively. and They represent the disturbance estimates respectively. Sum of total disturbance estimates Z-transform; Based on disturbance estimate In state feedback control output By adding a negative compensation amount, the system control input with disturbance compensation is represented as follows: wherein system control input representing a motion control system; in, The process of adjusting the filter gain based on deep reinforcement learning includes: A deep Q-network is used to learn the filter gain under given interference and random sensor measurement noise conditions. The state space, action space, and reward function of the deep Q-network are designed as follows: In the formula, For state space, For input signal, This is the system output of the motion control system. For trajectory tracking error, Indicates the first Filter gain per control cycle, Indicates the first Filter gain per control cycle, For the action space, Indicates an action, including gain. Deceleration, holding, and acceleration actions Indicates a decrease in value, Indicates an increase value. , express upper and lower bounds Here, abs represents the reward function, and absolute value function. The weighting parameter is set to a positive constant. This represents the absolute value of the output error. This represents the correlation noise suppression index.

2. The high-performance anti-jamming method based on deep reinforcement learning jammer according to claim 1, characterized in that, The establishment of the equivalent input disturbance state space model of the motion control system includes: The motion control system is represented as: In the formula, This represents the system state of the motion control system at time k. and Let x be the system position and velocity, respectively, and let x(k+1) represent the system state of the motion control system at time k+1. This serves as the system control input for the motion control system. This is the system output of the motion control system. To measure noise for the sensor, It is a system matrix with the same dimensions as the system order. Indicates external disturbance. The formula represents the gain matrix corresponding to the external disturbance, and it satisfies the following constraints: The motion control system constituted has both observability and controllability; Introducing the concept of equivalent input interference, defining... The equivalent input disturbance for the motion control system, i.e. The effect on the system output is equivalent to an external disturbance. For system output The influence of this would lead to the rewriting of the motion control system as follows: The equivalent input disturbance state-space model of the motion control system is obtained.

3. The high-performance anti-jamming method based on deep reinforcement learning jammer according to claim 1, characterized in that, The design state observer includes: The state observer is designed as follows: In the formula, express State observations at time 10:00 It is a system matrix with the same dimensions as the system order. For system output The observed values, This is the gain of the state observer.

4. The high-performance anti-interference method based on a deep reinforcement learning interference estimator according to claim 1, characterized in that, The filter is expressed by the following formula: In the formula, For the Z-transform operator, It is an exponential function. The sampling period of the motion control system. The cutoff angular frequency, This represents the filter gain.

5. The high-performance anti-jamming method based on deep reinforcement learning jammer according to claim 1, characterized in that, filter Deep Q-network is used to learn and adjust the filter gain. The steps include: Step 1: Use random network parameters Initialize the network Copy the same parameters To initialize the target network Initialize the experience replay pool Select discount factor Exploring Probability ; Step 2: Select a state from the state space; Step 3: Randomly generate thresholds ,like Select action number Execute actions Otherwise, randomly select an action number and execute the action. ; Step 4: Adjust filter gain ; Step 5: Calculate the reward , the state becomes ; Step 6: The stored to the experience pool ; Step 7: If When the number of data points reaches a threshold, select M data points from them. For each data point, compute a temporary term using the target network. Subsequently, the objective loss function is minimized. Update the current network ; Step 8: Interval After the second sampling, the same parameters are copied. To update the target network .

6. The high-performance anti-jamming method based on a deep reinforcement learning jammer estimator according to claim 1, characterized in that, The internal mold system is represented by the following formula: In the formula, express The state of the internal model system at any given time. and It is a system matrix with the same dimensions as the internal model system.