An attack detection method, apparatus, device, and storage medium
By establishing an extended system within a cyber-physical system and calculating time-varying control laws, and utilizing reinforcement learning and Kalman filtering to detect covert erroneous data injection attacks, this method solves the detection challenges in existing technologies and achieves efficient and secure attack detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2023-08-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies are insufficient to effectively detect covert erroneous data injection attacks in cyber-physical systems, and existing methods may affect the tracking performance of control systems or pose a security risk of leakage.
Extended systems are established on both the controlled and monitored sides. By mapping inputs and outputs to encoding functions and integrating them into the extended systems, time-varying control laws are calculated, and reinforcement learning and Kalman filtering are used to determine whether an attack exists.
It achieves efficient detection of hidden erroneous data injection attacks, avoids impacting the tracking performance of the control system, requires no additional hardware overhead, reduces the risk of security leaks, and improves the detection rate and system security.
Smart Images

Figure CN117092981B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of physical system security, and in particular to an attack detection method, apparatus, device, and storage medium. Background Technology
[0002] Currently, controllers in cyber-physical systems are vulnerable to covert erroneous data injection attacks from the network, such as zero dynamics attacks, covert attacks, and replay attacks. Chi-square detectors, commonly used for attack detection, cannot detect covert erroneous data injection attacks. Current methods for combating covert erroneous data injection attacks generally include dynamic watermarking detection and moving target detection. These two methods can effectively detect covert erroneous data injection attacks under certain conditions, such as high-variance watermark injection signals and preset system coding information. However, both have certain drawbacks. High-variance watermark injection signals can affect the tracking performance of the control system and require additional system hardware overhead; preset system coding information poses a risk of leakage, easily causing security vulnerabilities. Summary of the Invention
[0003] In view of this, the purpose of this application is to provide an attack detection method, apparatus, device and storage medium, which solves the problem of difficulty in detecting hidden erroneous data injection attacks in physical systems in the prior art.
[0004] To address the aforementioned technical problems, this application provides an attack detection method, comprising:
[0005] The inputs and outputs of the controlled side are mapped to the encoding function and integrated into the extended system of the controlled side to obtain the time-varying control law of the controlled side;
[0006] The inputs and outputs of the monitoring side are mapped to the encoding function and integrated into the extended system of the monitoring side to obtain the time-varying control law of the monitoring side;
[0007] The first residual value is calculated based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side;
[0008] An attack exists when the first residual value is greater than the threshold.
[0009] Optionally, the step of mapping the inputs and outputs of the controlled side to an encoding function and integrating it into the extended system of the controlled side to calculate the time-varying control law of the controlled side includes:
[0010] The inputs and outputs of the controlled side are mapped to a nonlinear coding function and integrated into the extended system of the controlled side to calculate the time-varying control law of the controlled side.
[0011] Optionally, mapping the inputs and outputs of the controlled side to a nonlinear coding function and integrating it into the extended system of the controlled side to obtain the time-varying control law of the controlled side includes:
[0012] The nonlinear coding function is used to map the inputs and outputs of the controlled side to the extended system matrix of the controlled side;
[0013] The time-varying control law of the controlled side is obtained by solving the extended system matrix of the controlled side through reinforcement learning.
[0014] Optionally, the step of obtaining the time-varying control law of the controlled side through reinforcement learning based on the extended system matrix of the controlled side includes:
[0015] The time-varying control law of the controlled side is obtained by solving the extended system matrix of the controlled side using the Q Learning algorithm; the time-varying control law of the controlled side obtained by solving the Q Learning algorithm includes:
[0016] The action value function is derived from the reward function;
[0017] By approximating and partially differentiating the action value function using a linear basis, the time-varying control law of the controlled side can be obtained.
[0018] Optionally, prior to the existence of the attack, the following is also included:
[0019] Acquire the feedback data from the controlled side and the status data from the monitoring side;
[0020] The second residual value is calculated based on the returned data and the status data;
[0021] The total residual value is calculated based on the first residual value and the second residual value;
[0022] An attack exists when the total residual value is greater than the threshold.
[0023] Optionally, after the existence of the attack, the following may also be included:
[0024] The extended system matrix on the monitoring side is rewritten to obtain the security state estimation dynamic equation; the security state estimation matrix in the security state estimation dynamic equation includes the attack injection vector;
[0025] The true estimated state vector of the controlled side is obtained by using the Kalman filter method and the aforementioned safety state estimation dynamic equation.
[0026] Optionally, obtaining the true estimated state vector of the controlled side through the Kalman filter method and the safety state estimation dynamic equation includes:
[0027] The minimum value of the mean square error estimation of the output vector in the dynamic equation for the safety state estimation is taken as the first calculation formula;
[0028] The first calculation formula is solved using the gradient descent method to obtain the safety state vector;
[0029] The true estimated state vector of the controlled side is obtained based on the Kalman filter method and the security state vector.
[0030] This application also provides an attack detection device, including:
[0031] The controlled-side mapping and integration module is used to map the inputs and outputs of the controlled side to the encoding function and integrate them into the extended system of the controlled side to obtain the time-varying control law of the controlled side.
[0032] The monitoring-side mapping and integration module is used to map the inputs and outputs of the monitoring side to the encoding function and integrate them into the extended system of the monitoring side to obtain the time-varying control law of the monitoring side.
[0033] The calculation module is used to calculate the first residual value based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side;
[0034] The judgment module is used to determine whether an attack exists when the first residual value is greater than a threshold.
[0035] This application also provides an attack detection device, including:
[0036] Memory, used to store computer programs;
[0037] A processor, used to implement the steps of the attack detection method described above when executing the computer program.
[0038] This application also provides a storage medium storing a computer program, which, when executed by a processor, implements the steps of the attack detection method described above.
[0039] As can be seen, this application establishes extended systems on both the controlled and monitoring sides, maps the inputs and outputs of each side to encoding functions, and integrates them into the corresponding extended systems on each side to obtain the time-varying control rates for each side. The existence of an attack is then determined based on the time-varying control rates of both sides. This application can effectively detect hidden erroneous data injection attacks in the system with a detection rate higher than existing dynamic watermark detection and moving target detection methods. Furthermore, this application does not require injecting dynamic watermark signals at the controller input, thus not reducing the tracking performance of the control system; it requires no additional system hardware overhead, can generate system encoding information in real time, and does not require pre-setting a security encoding seed, avoiding security risks caused by leakage.
[0040] In addition, this application also provides an attack detection device, equipment, and storage medium, which have the same beneficial effects as described above. Attached Figure Description
[0041] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0042] Figure 1 A cyber-physical system architecture diagram provided in this application embodiment;
[0043] Figure 2 A flowchart illustrating an attack detection method provided in an embodiment of this application;
[0044] Figure 3 A schematic diagram of an attack detection principle provided in an embodiment of this application;
[0045] Figure 4 An example diagram of a security status assessment provided in an embodiment of this application;
[0046] Figure 5 This is a schematic diagram of the structure of an attack detection device provided in an embodiment of this application;
[0047] Figure 6 This is a schematic diagram of the structure of an attack detection device provided in an embodiment of this application. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0049] Cyber-physical systems can be divided into a monitoring side and a controlled side, which communicate with each other via a network. For example... Figure 1 As shown, Figure 1 This application provides a cyber-physical system architecture diagram. The controlled side outputs sensor data y(k) and injects attack vector a. y The monitoring side receives the feedback data y from the controlled side. a (k) refers to the tampered output data; the monitoring side outputs control command u(k) and injects attack vector a. u The controlled side receives the tampered control commands u from the monitoring side. a (k). Chi-square observer on the monitoring side calculates residuals. and the detection threshold with chi-square distribution Compare to determine whether an attack has occurred.
[0050] This application proposes an attack detection method that can efficiently and accurately detect hidden erroneous data injection attacks in physical systems. Please refer to [link / reference needed] for details. Figure 2 , Figure 2 A flowchart illustrating an attack detection method provided in an embodiment of this application. The method may include:
[0051] S101: Map the inputs and outputs of the controlled side to the encoding function and integrate them into the extended system of the controlled side to obtain the time-varying control law of the controlled side.
[0052] S102: Map the inputs and outputs of the monitoring side to the encoding function and integrate them into the extended system of the monitoring side to obtain the time-varying control law of the monitoring side.
[0053] The execution subject in this embodiment is a terminal. This embodiment does not limit the type of terminal, as long as it can perform attack detection operations. This embodiment does not limit the location of the terminal; for example, the terminal can be installed on the monitoring side; or the terminal can be independent of both the monitoring and controlled sides, serving as a third party. Furthermore, for ease of detection and to save space and cost, the terminal can be installed on the monitoring side. This embodiment does not limit the execution order of steps S101 and S102; generally, they are executed synchronously.
[0054] In this embodiment, an extended system needs to be built on both the controlled side and the monitoring side, and the time-varying control law on each side of the monitoring side and the controlled side needs to be calculated. Therefore, the construction and calculation processes on both sides are the same, and the monitoring side and the controlled side are performed synchronously. This embodiment is described using the controlled side as an example; the execution process on the monitoring side is similar.
[0055] This embodiment does not limit the encoding function. For example, it can be a linear function, a non-linear function, or other encoding functions.
[0056] Furthermore, to increase the difficulty for attackers to crack the code, the above-mentioned mapping of the input and output of the controlled side to an encoding function and integration into the extended system of the controlled side to obtain the time-varying control law of the controlled side may include the following steps, specifically including:
[0057] The inputs and outputs of the controlled side are mapped to a nonlinear coding function and integrated into the extended system of the controlled side to calculate the time-varying control law of the controlled side.
[0058] In this embodiment, a non-linear encoding function is used to map the input and output and integrate them into the extended system to obtain a time-varying control law.
[0059] The general formula for the extended systems on both the controlled and monitoring sides is:
[0060]
[0061]
[0062] In the above formula, A, B, C are the linear state-space matrices of the controlled system, x(k) and x(k+1) are the state vectors of the controlled system, and X e,d (k+1) is a subset The extended system state variables are u(k), which is the control input of the controlled system, w(k), which is the process noise of the controlled system, v(k), which is the noise measured by the sensor, and y(k), which is the output of the controlled system.
[0063] C e Is it adding an encoding function? The state-space matrix of a time-varying extended system, X e,d (k) is the extended system state vector, u e (k) represents the extended system control input, w e (k) represents the extended system process noise, v e (k) represents the extended system measurement noise, Y e,d (k) represents the output of the extended system. When this general formula is applied to the controlled side, the extended system here refers to the extended system of the controlled side; when this formula is applied to the monitoring side, the extended system here refers to the extended system of the monitoring side.
[0064] in,
[0065]
[0066]
[0067]
[0068]
[0069] In the above formula, I is an identity matrix of appropriate dimensions. For a nonlinear matrix of appropriate dimensions ( (For non-linear coding functions), when an attack exists, Caused by an attack and The change, i.e., the vector between the monitoring and controlled devices caused by the attack. different( ), For the controlled side For the monitoring side A e B e Let x be a constant, and let x be the state-space matrix of the invariant extended system. e (k) represents the state variables of the extended system. For n e dimensional real number field, For m e It is the field of real numbers.
[0070] Furthermore, to ensure the accuracy and efficiency of the time-varying control law, the above-mentioned input and output of the controlled side are mapped to a nonlinear coding function and integrated into the extended system of the controlled side to obtain the time-varying control law of the controlled side. This can include the following steps, specifically:
[0071] Step 31: Use a nonlinear coding function to map the inputs and outputs of the controlled side to the extended system matrix of the controlled side;
[0072] Step 32: Obtain the time-varying control law of the controlled side by solving the extended system matrix of the controlled side through reinforcement learning.
[0073] Figure 3 This is a schematic diagram of an attack detection principle provided in an embodiment of this application. Figure 3 (a) represents a no-attack scenario, where no attack exists. The upper side is the monitoring side, and the lower side is the controlled side. The controlled side will be used as an example for explanation; the monitoring side is similar. Here, u represents the input of the controlled side, and y represents the output of the controlled side. C e Let be the state space matrix of the extended system on the controlled side. Figure 3 (b) represents the attack scenario, i.e., an attack exists, y a This refers to the data transmitted back from the controlled side, i.e., the tampered output data. a This refers to the tampered control commands from the monitoring side.
[0074] This embodiment uses reinforcement learning to move from "attack → nonlinear coding function" →Extended System Matrix Extending to "Attack → Non-linear coding function" →Extended System Matrix →Time-varying control law" to more efficiently detect hidden erroneous data injection attacks in the system. The extended system time-varying control law used for detection can be written as:
[0075] π(X e,d ,u e ):=u e (k)=F e,d (k)X e,d (k);
[0076] In the above formula, π(X) e,d ,u e F is a control strategy in reinforcement learning theory. e,d (k) represents the time-varying control law of the extended system: (k) is the monitoring side. and the controlled side
[0077] Furthermore, to better solve for the time-varying control law, the above-mentioned method of obtaining the time-varying control law of the controlled side through reinforcement learning based on the extended system matrix of the controlled side may include the following steps, specifically:
[0078] Step 41: Obtain the time-varying control law of the controlled side using the Q-Learning algorithm based on the extended system matrix of the controlled side; the time-varying control law of the controlled side obtained by the Q-Learning algorithm includes:
[0079] Step 42: Obtain the action value function based on the reward function;
[0080] Step 43: Use the linear basis to approximate and partially differentiate the action value function to obtain the time-varying control law of the controlled side.
[0081] This embodiment sets the reward function in the form of the loss function in the LQR algorithm (linear quadratic regulator) to solve for the time-varying control law. The reward function is:
[0082]
[0083] In the above formula, For the reward function in reinforcement learning theory; γ k-i Q is the decay factor; its value is (0,1) to ensure the convergence of the algorithm; e Q d R corresponds to x and X respectively. e u e The weight matrix; η represents the length of the data to be recorded; Represents 2n e dimensional real number field, It represents the real number field.
[0084] The action value Q function is:
[0085]
[0086] In the above formula, Q(X) e,d ,u e () is the action value function in reinforcement learning theory. For greater rigor, the action value function in this embodiment incorporates... Let be the state transition probability matrix.
[0087] This embodiment uses a linear basis pair Q-function approximation, which can be written as:
[0088]
[0089] In the above formula, W(k) represents a linear basis, which is specifically the column vector combination vec(H(k)) of the unknown matrix H; Φ(Z e (k) is a vector Z e (k) and The Kronecker product; H XX (k), H Xu (k), H uX (k), H uu (k) is a block matrix of the unknown H matrix, whose specific dimensions are the same as X. e,d ,u e Related; the unknown H matrix is the matrix to be solved.
[0090] For the time-varying control law F of the extended system e,d (k) is:
[0091]
[0092] S103: The first residual value is calculated based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side.
[0093] S104: An attack exists when the first residual value is greater than the threshold.
[0094] Using the methods described above, the solutions can be obtained respectively. Time-varying control rate on the monitoring side and Time-varying control law on the controlled side; calculate the first residual value. and J th The chi-square distribution detection threshold is compared to determine whether an attack has occurred.
[0095]
[0096]
[0097]
[0098] Furthermore, to ensure more accurate attack detection, the following steps may be included, specifically:
[0099] Step 51: Obtain the feedback data from the controlled side and the status data from the monitoring side;
[0100] Step 52: Calculate the second residual value based on the returned data and status data;
[0101] Step 53: Calculate the total residual value based on the first residual value and the second residual value;
[0102] Step 54: An attack exists when the total residual value is greater than the threshold.
[0103] This embodiment calculates the second residual value based on the data returned from the controlled side and the status data from the monitoring side, and the attack detection function. Based on χ 2 The detection structure is constructed using the following formula:
[0104] r0(k)=y a (k)-Cx m (k);
[0105]
[0106] In the above formula, y a Indicates the returned data under control, x m This represents the monitored status data, the second residual value. Let r be the covariance matrix of the residual r0, and C be the system output matrix.
[0107] In this embodiment, the total residual value is calculated based on the first residual value and the second residual value. The specific formula is as follows:
[0108] r F,0 (k)=r F (k)+r0(k)
[0109]
[0110]
[0111] λ is the coefficient of the first residual value.
[0112] Furthermore, in order to assess the security status of the compromised side after confirming the existence of an attack, the following steps may also be included after the attack has been identified:
[0113] Step 61: Rewrite the extended system matrix on the monitoring side to obtain the security state estimation dynamic equation; the security state estimation matrix in the security state estimation dynamic equation includes the attack injection vector;
[0114] Step 62: Obtain the true estimated state vector of the controlled side through Kalman filtering and the dynamic equation for safety state estimation.
[0115] Extended system matrix on the monitoring side It can be rewritten as:
[0116]
[0117] When n=2, the specific calculation is as follows:
[0118]
[0119]
[0120]
[0121]
[0122] Right now:
[0123]
[0124] in, To estimate the dynamic equations for the safe state; O Y This is the output vector; For the safety state estimation matrix, Includes attack injection vector a u a y a u It is an attack vector injected by the attacker into the data transmitted from the monitoring device to the controlled actuator, a y It refers to the attacker's attack on the data transmitted from the controlled sensor to the monitoring sensor; O X,u The input vector.
[0125] Furthermore, the problem of estimating the security state of a control system under attack is transformed into solving the minimum mean square error problem. The above-mentioned method of obtaining the true estimated state vector of the controlled side through Kalman filtering and the security state estimation dynamic equation can include the following steps, specifically:
[0126] Step 71: Use the minimum value of the mean square error estimation of the output vector in the dynamic equation for safety state estimation as the first calculation formula;
[0127] Step 72: Solve the first calculation formula using the gradient descent method to obtain the safe state vector;
[0128] Step 73: Obtain the true estimated state vector of the controlled side based on the Kalman filter method and the safety state vector.
[0129] This embodiment, by obtaining the correct attack injection vector, can further estimate the true system state on the controlled side using the Kalman filter method. The attack injection vector 'a' is calculated through the security state estimation dynamics equation. u a y The method, the first calculation formula is:
[0130]
[0131]
[0132]
[0133]
[0134] in, and Inject vector a into parameterized attack u a y θ is a safety state variable; yes Parameterization; ε is a one-dimensional positive real number; θ It is a very small constant, such as 0.001. It is an n+m dimensional real number field.
[0135] At this point, based on the Kalman filter method, the true state estimate of the controlled side when attacked can be written as:
[0136]
[0137] K m (k)=P m (k)C T (CP m (k)C T +R kal )-1
[0138] x m (k)=x m (k)+K m (k)(y a (k)-θ ay -Cx m (k))
[0139] P m (k+1)=A(IK m (k)C)P m (k)A T +Q kal
[0140] r0(k)=y a (k)-Cx m (k);
[0141] In the above formula, P m Let K be the covariance matrix of the Kalman filter. m Q is the Kalman filter gain; kal R kal Let C be the weight matrix of the Kalman filter, and C be the system output matrix.
[0142] The attack detection method provided in this application involves mapping the inputs and outputs of the controlled side to an encoding function and integrating it into the extended system of the controlled side to obtain the time-varying control rate of the controlled side; mapping the inputs and outputs of the monitoring side to an encoding function and integrating it into the extended system of the monitoring side to obtain the time-varying control rate of the monitoring side; calculating a first residual value based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side; and identifying an attack if the first residual value is greater than a threshold. This application can effectively detect hidden erroneous data injection attacks in the system with a detection rate higher than existing dynamic watermark detection and moving target detection methods; furthermore, this application does not require injecting dynamic watermark signals at the controller input, thus not reducing the tracking performance of the control system; it requires no additional system hardware overhead, can generate system encoding information in real time, and does not require preset security encoding seeds, avoiding security risks caused by leakage. Furthermore, the nonlinear encoding function increases the difficulty for attackers to crack the code, ensuring security. Reinforcement learning is used to solve for the time-varying control law, guaranteeing both accuracy and computational efficiency. The use of reward functions, action value functions, and approximations and partial derivatives based on linear bases ensures consistent convergence between the controlled and monitoring sides. The second residual value is calculated from the feedback data from the controlled side and the state data from the monitoring side; the existence of an attack is determined based on the first and second residual values, resulting in more accurate results. After confirming an attack, a security state assessment is performed on the controlled side. This security state assessment is integrated into the widely used Kalman filter, reducing overhead and improving system security performance. Finally, the problem of estimating the security state of a controlled system under attack is transformed into solving for the minimum mean square error, simplifying computation.
[0143] To make this application easier to understand, please refer to the following: Figure 4 , Figure 4 This application provides an example diagram for security status assessment, which specifically includes: the upper side of the network is the controlled side, and the lower side is the monitoring side. The controlled side includes: the attack injection vector a from the monitoring side. u Input u before the controlled side is attacked p (k); Input / control commands u after the controlled side is attacked (modified). a (k); y a This refers to the controlled feedback data, i.e., the tampered output data; the encoding function on the controlled side. The controlled side's online autoencoder detector detects the following objects: and the output of the controlled side extended system Controlled-side extension system
[0144] Monitoring side: Attack injection vector a on the monitoring side yInput y before the monitoring side is attacked m (k); Input data from the monitoring side after an attack / Return data from the controlled side y a (k); Encoding function on the monitoring side The online autoencoder detector on the monitoring side detects the following objects: and monitoring side extended system output Monitoring-side extension system The chi-square observer on the monitoring side calculates the residuals. and the detection threshold with chi-square distribution Compare data to determine if an attack has occurred; if an attack has occurred, perform a security status assessment. It is controlled using a nominal controller.
[0145] The attack detection device provided in the embodiments of this application is described below. The attack detection device described below can be referred to in correspondence with the attack detection method described above.
[0146] Please refer to the details. Figure 5 , Figure 5 A schematic diagram of an attack detection device provided in this application embodiment may include:
[0147] The controlled-side mapping and integration module 100 is used to map the inputs and outputs of the controlled side to the encoding function and integrate them into the extended system of the controlled side to obtain the time-varying control law of the controlled side.
[0148] The monitoring-side mapping and integration module 200 is used to map the inputs and outputs of the monitoring side to the encoding function and integrate them into the extended system of the monitoring side to obtain the time-varying control law of the monitoring side.
[0149] The calculation module 300 is used to calculate the first residual value based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side.
[0150] The judgment module 400 is used to determine if an attack exists when the first residual value is greater than a threshold.
[0151] Based on the above embodiments, the controlled-side mapping integration module 100 may include:
[0152] The mapping and integration unit is used to map the inputs and outputs of the controlled side to a nonlinear coding function and integrate it into the extended system of the controlled side to calculate the time-varying control law of the controlled side.
[0153] Based on the above embodiments, the mapping and integration unit may include:
[0154] The mapping subunit is used to map the inputs and outputs of the controlled side to the extended system matrix of the controlled side using the nonlinear encoding function;
[0155] The solution subunit is used to obtain the time-varying control law of the controlled side by solving the extended system matrix of the controlled side through reinforcement learning.
[0156] Based on the above embodiments, the solving subunit may include:
[0157] The algorithm solution subunit is used to obtain the time-varying control law of the controlled side by using the Q Learning algorithm based on the extended system matrix of the controlled side; the process of obtaining the time-varying control law of the controlled side by using the Q Learning algorithm includes:
[0158] The action value function determination sub-unit is used to obtain the action value function based on the reward function;
[0159] The approximate partial derivative solving sub-unit is used to approximate and partially differentiate the action value function using a linear basis, and solve for the time-varying control law of the controlled side.
[0160] Based on the above embodiments, the attack detection device may further include:
[0161] The acquisition module is used to acquire the feedback data from the controlled side and the status data from the monitoring side before the attack is detected.
[0162] The second residual value calculation module is used to calculate the second residual value based on the returned data and the status data.
[0163] The total residual value calculation module is used to calculate the total residual value based on the first residual value and the second residual value;
[0164] An attack detection module is used to determine if an attack exists when the total residual value is greater than the threshold.
[0165] Based on any of the above embodiments, the attack detection device may further include:
[0166] The rewriting module is used to rewrite the extended system matrix of the monitoring side to obtain the security state estimation dynamic equation; the security state estimation matrix in the security state estimation dynamic equation includes the attack injection vector;
[0167] The module for solving the true estimated state vector is used to obtain the true estimated state vector of the controlled side through the Kalman filter method and the safety state estimation dynamic equation.
[0168] Based on the above embodiments, the module for solving the true estimated state vector may include:
[0169] The first calculation formula determination unit is used to take the minimum value of the mean square error estimation of the output vector in the safety state estimation dynamic equation as the first calculation formula.
[0170] The safety state vector solving unit is used to solve the first calculation formula using the gradient descent method to obtain the safety state vector.
[0171] The true estimated state vector solving unit is used to obtain the true estimated state vector of the controlled side based on the Kalman filter method and the safety state vector.
[0172] It should be noted that the order of the modules and units in the above-mentioned attack detection device can be changed without affecting the logic.
[0173] The attack detection device provided in this application includes a controlled-side mapping and integration module 100, which maps the inputs and outputs of the controlled side to an encoding function and integrates it into the extended system of the controlled side to obtain the time-varying control rate of the controlled side; a monitoring-side mapping and integration module 200, which maps the inputs and outputs of the monitoring side to an encoding function and integrates it into the extended system of the monitoring side to obtain the time-varying control rate of the monitoring side; a calculation module 300, which calculates a first residual value based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side; and a judgment module 400, which determines that an attack exists when the first residual value is greater than a threshold. Furthermore, the nonlinear encoding function increases the difficulty for attackers to crack the code, ensuring security. Reinforcement learning is used to solve for the time-varying control law, guaranteeing both accuracy and computational efficiency. The use of reward functions, action value functions, and approximations and partial derivatives based on linear bases ensures consistent convergence between the controlled and monitoring sides. The second residual value is calculated from the feedback data from the controlled side and the state data from the monitoring side; the existence of an attack is determined based on the first and second residual values, resulting in more accurate results. After confirming an attack, a security state assessment is performed on the controlled side. This security state assessment is integrated into the widely used Kalman filter, reducing overhead and improving system security performance. Finally, the problem of estimating the security state of a controlled system under attack is transformed into solving for the minimum mean square error, simplifying computation.
[0174] The attack detection device provided in the embodiments of this application is described below. The attack detection device described below and the attack detection method described above can be referred to in correspondence.
[0175] Please refer to Figure 6 , Figure 6 The structural diagram of the attack detection device provided in the embodiments of this application may include:
[0176] Memory 10 is used to store computer programs;
[0177] Processor 20 is used to execute computer programs to implement the attack detection method described above.
[0178] The memory 10, processor 20, and communication interface 31 all communicate with each other through the communication bus 32.
[0179] In this embodiment, the memory 10 is used to store one or more programs. The programs may include program code, which includes computer operation instructions. In this embodiment, the memory 10 may store programs for implementing the following functions:
[0180] The inputs and outputs of the controlled side are mapped to the encoding function and integrated into the extended system of the controlled side to obtain the time-varying control law of the controlled side;
[0181] The inputs and outputs of the monitoring side are mapped to the encoding function and integrated into the extended system of the monitoring side to obtain the time-varying control law of the monitoring side;
[0182] The first residual value is calculated based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side;
[0183] An attack exists when the first residual value is greater than the threshold.
[0184] In one possible implementation, the memory 10 may include a program storage area and a data storage area, wherein the program storage area may store the operating system and applications required for at least one function; and the data storage area may store data created during use.
[0185] Furthermore, memory 10 may include read-only memory and random access memory, providing instructions and data to the processor. A portion of the memory may also include NVRAM. The memory stores operating systems and operating instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic tasks and handling hardware-based tasks.
[0186] Processor 20 can be a central processing unit (CPU), an application-specific integrated circuit, a digital signal processor, a field-programmable gate array, or other programmable logic device. Processor 20 can be a microprocessor or any conventional processor. Processor 20 can call programs stored in memory 10.
[0187] Communication interface 31 can be an interface for the communication module, used to connect with other devices or systems.
[0188] Of course, it should be noted that, Figure 6 The structure shown does not constitute a limitation on the attack detection device in the embodiments of this application. In practical applications, the attack detection device may include more than Figure 6 More or fewer components as shown, or combinations of certain components.
[0189] The storage medium provided in the embodiments of this application is described below. The storage medium described below can be referred to in correspondence with the attack detection method described above.
[0190] This application also provides a storage medium storing a computer program, which, when executed by a processor, implements the steps of the attack detection method described above.
[0191] The storage medium can include various media that can store program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0192] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.
[0193] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this application.
[0194] Finally, it should be noted that in this document, relationships such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0195] The above provides a detailed description of the attack detection method, apparatus, device, and storage medium provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. An attack detection method, characterized in that, include: The inputs and outputs of the controlled side are mapped to the encoding function and integrated into the extended system of the controlled side to obtain the time-varying control law of the controlled side; The inputs and outputs of the monitoring side are mapped to the encoding function and integrated into the extended system of the monitoring side to obtain the time-varying control law of the monitoring side; The first residual value is calculated based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side; An attack exists when the first residual value is greater than the threshold. The step of mapping the inputs and outputs of the controlled side to an encoding function and integrating them into the extended system of the controlled side to obtain the time-varying control law of the controlled side includes: mapping the inputs and outputs of the controlled side to a nonlinear encoding function and integrating them into the extended system of the controlled side, and calculating the time-varying control law of the controlled side. The step of mapping the input and output of the controlled side to a nonlinear coding function and integrating it into the extended system of the controlled side to calculate the time-varying control law of the controlled side includes: mapping the input and output of the controlled side to the extended system matrix of the controlled side using the nonlinear coding function; and obtaining the time-varying control law of the controlled side by solving the extended system matrix of the controlled side through reinforcement learning. The step of obtaining the time-varying control rate of the controlled side by reinforcement learning based on the extended system matrix of the controlled side includes: obtaining the time-varying control rate of the controlled side by Q-Learning algorithm based on the extended system matrix of the controlled side; the step of obtaining the time-varying control rate of the controlled side by Q-Learning algorithm includes: obtaining the action value function based on the reward function; approximating and partially differentiating the action value function using a linear basis to obtain the time-varying control rate of the controlled side; The general formula for the extended systems on both the controlled and monitoring sides is: ; ; Where A, B, and C are the linear state-space matrices of the controlled system. and Let be the state vector of the controlled system. For inclusion Extended system state variables, For the control input of the controlled system, For the process noise of the controlled system, For sensor measurement of noise, For the output of the controlled system, , , Is it adding an encoding function? The state-space matrix of a time-varying extended system. To extend the system state vector, To expand system control input, To expand system process noise, To extend the system's noise measurement capabilities, To expand system output; in, ; ; ; ; In the above formula, I is the identity matrix. , It is a nonlinear matrix. It is a non-linear coding function, and when an attack occurs, Caused by an attack and The change, i.e., the vector between the monitoring and controlled devices caused by the attack. different, , For the controlled side , For the monitoring side ; , is a constant, and is the state-space matrix of the invariant extended system. To expand the system's state variables, for dimensional real number field, for 3D real number field; The time-varying control law of the extended system used for detection can be written as: ; In the above formula, It is a control strategy in reinforcement learning theory. To extend the system's time-varying control rate: the monitoring side is... and the controlled side ; The reward function mentioned above is: ; ; In the above formula, This refers to the reward function in reinforcement learning theory; This is the decay factor; its value is (0,1) to ensure the convergence of the algorithm. , R respectively correspond to , , The weight matrix; This represents the length of the data that needs to be recorded; represent dimensional real number field, Represents the real number field; The action value function is: ; In the above formula, It is the action-value function in reinforcement learning theory. This is the state transition probability matrix; Using a linear basis to approximate the motion value function, we have: ; In the above formula, Denotes a linear basis, specifically in the form of a combination of column vectors of an unknown matrix H. ; It is a vector and The Kronecker product; , , , It is a block matrix of an unknown H matrix, whose specific dimensions and Related; the unknown H matrix is the matrix to be solved; For the time-varying control law of the extended system for: ; ; The calculation process is the same on both the monitoring side and the controlled side. The formula for calculating the first residual value is: ; in, .
2. The attack detection method according to claim 1, characterized in that, Prior to the existence of the attack, it also includes: Acquire the feedback data from the controlled side and the status data from the monitoring side; The second residual value is calculated based on the returned data and the status data; The total residual value is calculated based on the first residual value and the second residual value; An attack exists when the total residual value is greater than the threshold.
3. The attack detection method according to claim 1 or 2, characterized in that, Following the existence of the attack, it also includes: The extended system matrix on the monitoring side is rewritten to obtain the security state estimation dynamic equation; the security state estimation matrix in the security state estimation dynamic equation includes the attack injection vector; The true estimated state vector of the controlled side is obtained by using the Kalman filter method and the aforementioned safety state estimation dynamic equation.
4. The attack detection method according to claim 3, characterized in that, The process of obtaining the true estimated state vector of the controlled side through the Kalman filter method and the safety state estimation dynamic equation includes: The minimum value of the mean square error estimation of the output vector in the dynamic equation for the safety state estimation is taken as the first calculation formula; The first calculation formula is solved using the gradient descent method to obtain the safety state vector; The true estimated state vector of the controlled side is obtained based on the Kalman filter method and the security state vector.
5. An attack detection device, characterized in that, include: The controlled-side mapping and integration module is used to map the inputs and outputs of the controlled side to the encoding function and integrate them into the extended system of the controlled side to obtain the time-varying control law of the controlled side. The monitoring-side mapping and integration module is used to map the inputs and outputs of the monitoring side to the encoding function and integrate them into the extended system of the monitoring side to obtain the time-varying control law of the monitoring side. The calculation module is used to calculate the first residual value based on the time-varying control rate of the controlled side and the time-varying control rate of the monitoring side; The judgment module is used to determine whether an attack exists when the first residual value is greater than a threshold. The controlled-side mapping and integration module includes: a mapping and integration unit, used to map the inputs and outputs of the controlled side to a nonlinear coding function and integrate it into the extended system of the controlled side, and calculate the time-varying control law of the controlled side; The mapping and integration unit includes: a mapping subunit, used to map the input and output of the controlled side to the extended system matrix of the controlled side using the nonlinear encoding function; and a solving subunit, used to solve for the time-varying control law of the controlled side through reinforcement learning based on the extended system matrix of the controlled side. The solution subunit includes: an algorithm solution subunit, used to obtain the time-varying control rate of the controlled side by using the QLearning algorithm based on the extended system matrix of the controlled side; the algorithm solution subunit includes: an action value function determination subunit, used to obtain the action value function based on the reward function; and an approximate partial derivative solution subunit, used to approximate and partially differentiate the action value function using a linear basis to obtain the time-varying control rate of the controlled side. The general formula for the extended systems on both the controlled and monitoring sides is: ; ; Where A, B, and C are the linear state-space matrices of the controlled system. and Let be the state vector of the controlled system. For inclusion Extended system state variables, For the control input of the controlled system, For the process noise of the controlled system, For sensor measurement of noise, For the output of the controlled system, , , Is it adding an encoding function? The state-space matrix of a time-varying extended system. To extend the system state vector, To expand system control input, To expand system process noise, To extend the system's noise measurement capabilities, To expand system output; in, ; ; ; ; In the above formula, I is the identity matrix. , It is a nonlinear matrix. It is a non-linear coding function, and when an attack occurs, Caused by an attack and The change, i.e., the vector between the monitoring and controlled devices caused by the attack. different, , For the controlled side , For the monitoring side ; , is a constant, and is the state-space matrix of the invariant extended system. To expand the system's state variables, for dimensional real number field, for 3D real number field; The time-varying control law of the extended system used for detection can be written as: ; In the above formula, It is a control strategy in reinforcement learning theory. To extend the system's time-varying control rate: the monitoring side is... and the controlled side ; The reward function mentioned above is: ; ; In the above formula, This refers to the reward function in reinforcement learning theory; This is the decay factor; its value is (0,1) to ensure the convergence of the algorithm. , R respectively correspond to , , The weight matrix; This represents the length of the data that needs to be recorded; represent dimensional real number field, Represents the real number field; The action value function is: ; In the above formula, It is the action-value function in reinforcement learning theory. This is the state transition probability matrix; Using a linear basis to approximate the motion value function, we have: ; In the above formula, Denotes a linear basis, specifically in the form of a combination of column vectors of an unknown matrix H. ; It is a vector and The Kronecker product; , , , It is a block matrix of an unknown H matrix, whose specific dimensions and Related; the unknown H matrix is the matrix to be solved; For the time-varying control law of the extended system for: ; ; The calculation process is the same on both the monitoring side and the controlled side. The formula for calculating the first residual value is: ; in, .
6. An attack detection device, characterized in that, include: Memory, used to store computer programs; A processor, configured to implement the steps of the attack detection method as described in any one of claims 1 to 4 when executing the computer program.
7. A storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, implements the steps of the attack detection method as described in any one of claims 1 to 4.