An artificial intelligence-based network security operation and maintenance abnormal behavior dynamic detection method
By constructing a heterogeneous temporal behavior graph and a temporal heterogeneous graph attention network, and combining the Flyner frame and the God constant differential equation model, the problems of incomplete modeling of operation and maintenance logs and the inability of thresholds to adapt in traditional methods are solved. This enables multi-dimensional dynamic anomaly detection of operation and maintenance behavior, reduces the false alarm rate, and improves the adaptability of detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING DONGXINGYUAN TECHNOLOGY DEVELOPMENT CO LTD
- Filing Date
- 2026-05-08
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional methods struggle to uniformly model the interaction relationships and operation sequence information of heterogeneous entities such as users, target devices, commands, and sessions in operation and maintenance logs. They cannot transform discrete operation and maintenance sequences into continuous behavioral manifold expressions, lack the ability to geometrically characterize the overall behavioral trajectory of a session, and existing anomaly detection methods lack a comprehensive quantification of the deviation between the differential geometric features of behavioral curves and information entropy. They cannot dynamically assess the degree of anomaly, the detection threshold cannot be adaptively adjusted, and the false alarm rate is high.
A heterogeneous temporal behavior graph is constructed, and an embedding vector is generated through a temporal heterogeneous graph attention network. After manifold parameterization, the instantaneous rate of curvature change and instantaneous torsion of the Flyner frame are extracted. By combining information entropy and dynamic baseline entropy, an adaptive detection threshold is generated using a neural frequent differential equation model, and the model parameters are optimized through a closed-loop mechanism.
It enables multi-dimensional dynamic anomaly quantification of operational behaviors, reducing false alarm rates and improving the timeliness and adaptability of detection.
Smart Images

Figure CN122247750A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the fields of network security and artificial intelligence technology, specifically an artificial intelligence-based method for dynamic detection of abnormal network security operation and maintenance behaviors. Background Technology
[0002] In enterprise IT operations and maintenance management, operations and maintenance personnel remotely operate and maintain target assets such as servers and network devices through an operations and maintenance audit system. This network security operations and maintenance audit system is a unified platform that aggregates data collected by various auditing components such as bastion hosts, host audit agents, and database audit gateways. The system records information such as user identifier, target device identifier, command string, and operation timestamp for each operation on a session-by-session basis, forming a large-scale operations and maintenance log. Such logs naturally have heterogeneous entity interactions and strict temporal correlations, constituting complex heterogeneous temporal behavioral data.
[0003] Operational security audits require continuous monitoring of operational behavior to promptly detect internal threats, account compromises, or unauthorized operations. Because operational scenarios involve different roles, target devices of various types, and diverse operational commands, normal behavior patterns dynamically change with business hours, user responsibilities, and system environment. Utilizing graph neural networks to model heterogeneous interaction relationships can automatically learn low-dimensional embedding representations of nodes such as users, target devices, commands, and sessions, capturing structural semantics. Arranging the operation embedding vectors within the same session temporally and parameterizing them as manifolds allows them to be viewed as continuous curves in a high-dimensional space. Furthermore, the curvature and torsion invariants of the Flyner frame in differential geometry can be used to characterize the local distortion features of the behavioral trajectory. Simultaneously, combining information entropy theory to quantify the uncertainty of the operational sequence and comparing historical behavioral patterns of the same role with dynamic baseline entropy can effectively measure the degree of abnormal deviation. In addition, to adapt to environmental changes, the detection threshold needs to be constructed as a continuous dynamic system. A neural network constant differential equation model can output the time-varying derivative of the threshold based on the current abnormal situation and environmental context, achieving adaptive adjustment. The integration of these technologies provides a foundation for intelligent anomaly detection in complex operational scenarios.
[0004] The following problems exist in the existing technology: Traditional methods struggle to model the interaction relationships and operation sequence information of heterogeneous entities such as users, target devices, commands, and sessions in operation and maintenance logs in a unified manner. They are unable to transform discrete operation and maintenance sequence into a continuous behavioral manifold representation, resulting in incomplete behavioral feature extraction and a lack of geometric characterization of the overall behavioral trajectory of the session. Existing anomaly detection methods lack a comprehensive quantitative method for the deviation of the differential geometric features of the behavior curve from the information entropy. Furthermore, the three factors of curvature change, torsion, and information entropy deviation are not effectively integrated, making it impossible to dynamically assess the degree of anomaly from the combined perspective of geometric distortion and uncertainty deviation. Traditional detection thresholds often use fixed values or simple rules, which cannot be adaptively adjusted according to the characteristics of the target device and the environmental context such as the network security situation. Furthermore, they lack an effective closed-loop mechanism to provide real-time feedback of alarm verification results to optimize model parameters, resulting in a lag in the matching of thresholds with dynamic abnormal situations and difficulty in continuously reducing the false alarm rate. Summary of the Invention
[0005] The present invention aims to solve at least one of the technical problems existing in the prior art; to this end, the present invention proposes a dynamic detection method for abnormal network security operation and maintenance behavior based on artificial intelligence, which is used to solve the above-mentioned technical problem.
[0006] The first aspect of this invention provides a method for dynamic detection of abnormal network security operation and maintenance behaviors based on artificial intelligence, comprising the following steps: S1: Obtain operation logs from the network security operation and maintenance audit system, and simultaneously extract target device characteristics and current network security status information to construct a heterogeneous time-series behavior graph; where the nodes in the heterogeneous time-series behavior graph include user nodes, target device nodes, command nodes and session nodes, and the edges represent the operation relationships recorded in the operation logs; S2: Encode each operation in the operation log using a preset temporal heterogeneous graph attention network to generate an embedding vector; group the operation operations into operation sessions according to the operation time order, and arrange the embedding vectors corresponding to the operation operations within the same operation session according to the operation time order to form the embedding vector sequence of the operation session; perform manifold parameterization on the embedding vector sequence and apply continuity constraints for correction to obtain the continuous behavior curve of the operation session with second-order continuous differentiability in high-dimensional space; S3: Based on a preset sliding time window, calculate the Flyner frame for the continuous behavior curve segment within the current sliding time window, extract the instantaneous rate of curvature change and instantaneous deflection, calculate the information entropy based on the embedding vector within the sliding time window, and obtain the dynamic baseline entropy based on the embedding vector statistics of historical operation and maintenance sessions of the same role in the same period. S4: Take the positive part of the product of the instantaneous rate of curvature change and the instantaneous torsion to obtain the twisting coordination quantity, and use the absolute value of the difference between the information entropy and the dynamic baseline entropy as the exponential decay factor; multiply the twisting coordination quantity and the exponential decay factor and integrate them within the sliding time window to obtain the dynamic anomaly index at the current moment. S5: Input the dynamic anomaly index and the environmental context vector composed of the target device characteristics and the network security situation information into a preset neural ordinary differential equation model to obtain the time-varying derivative of the detection threshold, and generate a continuously changing detection threshold through numerical integration; compare the dynamic anomaly index with the detection threshold in real time, generate an anomaly alarm when the dynamic anomaly index exceeds the detection threshold, obtain the verification result of the anomaly alarm from the preset alarm verification database, convert the verification result into a parameter adjustment signal, and update the parameters of the neural ordinary differential equation model through the adjoint sensitivity method.
[0007] Preferably, step S1 includes the following steps: Whenever an operation log is retrieved from the network security operation and maintenance audit system, the characteristics of the target device involved in the operation log and the current network security situation information are extracted synchronously, and the user identifier, target device identifier, command string, session identifier and operation timestamp in the operation log are parsed online. The command strings are standardized and deduplicated so that multiple command strings with the same semantics are mapped to the same command node; Based on the analysis results of the operation and maintenance logs, user nodes, target device nodes, command nodes, and session nodes that do not currently exist are dynamically created in the heterogeneous time-series behavior graph. At the same time, typed directed time-series edges are established. The typed directed time-series edges include execution edges from user nodes to command nodes, operation edges from command nodes to target device nodes, ownership edges from command nodes to session nodes, and initiation edges from user nodes to session nodes. Each edge carries an operation timestamp.
[0008] Preferably, in step S2, each operation in the operation log is encoded using a preset temporal heterogeneous graph attention network to generate an embedding vector, including the following steps: For each edge in the heterogeneous temporal behavior graph, an edge attribute vector is constructed. The edge attribute vector is formed by concatenating the one-hot encoding of the edge type and the Fourier encoding of the operation timestamp. In each layer of the pre-defined temporal heterogeneous graph attention network, the command node corresponding to the operation and maintenance is taken as the central node, and the node directly connected to the central node is taken as the neighbor node. The query vector and key vector of the central node and the neighbor node are obtained by linear projection of the node type, respectively, and then the dot product operation is performed. The edge attribute vector of the connecting edge is linearly projected and added to the dot product result as a bias term. After normalization, the attention weight is obtained. Then, the value vector of the neighbor node is weighted and aggregated using the attention weight. After residual connection and layer normalization, the result of this layer is output. After stacking multiple layers, the embedding vector is obtained.
[0009] Preferably, in step S2, the embedded vector sequence is subjected to manifold parameterization and corrected by applying continuity constraints to obtain a continuous behavior curve of the operation and maintenance session with second-order continuous differentiability in a high-dimensional space, including the following steps: For the embedded vector sequence arranged in chronological order within the operation and maintenance session, each embedded vector in the sequence is taken as a sampling point, and the tangent vector at each sampling point is calculated by the position difference between adjacent sampling points; Between adjacent sampling points, a cubic Hermite piecewise interpolation polynomial is constructed using the position of the sampling points and the tangent vector, so that adjacent polynomial segments satisfy positional continuity and first derivative continuity at the splicing point. By applying a second-order continuity constraint to the tangent vectors of all sampling points along the entire segment, and correcting the tangent vectors at each point by solving the three bending moment equations, a continuous behavior curve with second-order continuous differentiability is obtained.
[0010] Preferably, step S3 includes the following steps: The continuous behavior curve segments within a preset sliding time window are resampled at equal intervals according to normalized time parameters to obtain multiple discrete sampling points. The first-order derivative vector and the second-order derivative vector at each sampling point are calculated by numerical difference. The direction of the first-order derivative vector is used as the unit tangent vector. The second-order derivative vector is orthogonalized and normalized with respect to the unit tangent vector to obtain the second orthogonal vector. The third-order derivative vector is orthogonalized and normalized with respect to the unit tangent vector and the second orthogonal vector to obtain the third orthogonal vector. Thus, a sequence of mutually orthogonal FRENERAL frame vectors in high-dimensional space is generated. The magnitude of the derivative vector of the unit tangent vector with respect to the arc length parameter is calculated as the curvature. The instantaneous rate of change of curvature is obtained by taking the derivative of the curvature with respect to the arc length parameter. The inner product of the derivative of the second vector with respect to the arc length parameter and the third vector in the Flyner frame vector sequence is taken as the instantaneous torsion. Extract a subset of embedded vectors that are aligned with the time segment of the continuous behavior curve within the current sliding time window, and use kernel density estimation to obtain the probability density function of the embedded vector subset; then calculate the differential entropy based on the probability density function as the information entropy; Based on the current user's operation and maintenance role and the current time period, extract the historical information entropy generated by the same operation and maintenance role in the same time period from the historical operation and maintenance session information entropy storage, calculate the exponential moving average of the historical information entropy, and obtain the dynamic baseline entropy.
[0011] Preferably, in S4, the dynamic anomaly index at the current time t The calculation formula is: in, The length of the sliding time window. for The instantaneous rate of change of curvature at time t, for The instantaneous torsion at time t, H( represents the positive part operation) )for The information entropy at time t, for The dynamic baseline entropy at time t, It is the preset minimum positive number.
[0012] Preferably, in step S5, the time-varying derivative of the detection threshold is obtained by inputting the dynamic anomaly index and the environmental context vector composed of the target device characteristics and the network security situation information into a preset neural network differential equation model, and a continuously changing detection threshold is generated by numerical integration, including the following steps: Before the first detection, the statistical quantile of the dynamic anomaly index collected during historical attack-free periods is calculated, and this statistical quantile is used as the initial detection threshold. At and after the first detection, the initial detection threshold is used as the initial value. The detection threshold is modeled as a continuous-time dynamic system state. Through the fully connected neural network in the neural ordinary differential equation model, the concatenated vector of the current detection threshold, dynamic anomaly index and environmental context vector is used as input to output the time-varying derivative of the detection threshold. This makes the rate of change of the detection threshold determined by the three factors of the current threshold level, the degree of anomaly and the operation and maintenance environment. An adaptive step-size numerical integrator is used to generate a detection threshold that changes continuously with time by numerically integrating the time-varying derivative from the initial detection threshold. The adaptive step-size numerical integrator automatically adjusts the integration step size according to the change of the time-varying derivative.
[0013] Preferably, in step S5, obtaining the verification result of the abnormal alarm from the preset alarm verification database, converting the verification result into a parameter adjustment signal, and updating the parameters of the neural ordinary differential equation model using the adjoint sensitivity method includes the following steps: Retrieve verification results corresponding to abnormal alarms from the preset alarm verification database. Verification results include confirmation of attack and false alarms. When the verification result confirms an attack, the corresponding parameter adjustment signal is assigned a positive value of one; when the verification result is a false alarm, the corresponding parameter adjustment signal is assigned a negative value of one. Define a cumulative parameter adjustment signal function from the initial time to the current time. This function is based on the difference between the detection threshold at the alarm time and the dynamic anomaly index, which is mapped and then weighted by the parameter adjustment signal. With the goal of maximizing the cumulative parameter adjustment signal function, the parameter gradient of the fully connected neural network is calculated by the adjoint sensitivity method, and the parameters of the fully connected neural network are updated along the rising direction of the gradient.
[0014] Compared with the prior art, the beneficial effects of the present invention are: This invention constructs a heterogeneous temporal behavioral graph, which uniformly models four types of heterogeneous nodes and typed directed edges carrying timestamps; it uses a temporal heterogeneous graph attention network to encode operations and generate embedding vectors, and then performs manifold parameterization processing on the sequence according to time order to obtain continuous behavioral curves with second-order continuous differentiability, thus realizing the complete transformation from discrete operations to high-dimensional behavioral manifolds. This invention extracts instantaneous rate of curvature change and instantaneous torsion using the Flyner frame, combines information entropy and dynamic baseline entropy, and obtains the dynamic anomaly index by multiplying the torsion synergy with the exponential decay factor and integrating it within a sliding time window, thereby achieving multi-dimensional composite dynamic anomaly quantification. This invention inputs a dynamic anomaly index and an environmental context vector into a neural network constant differential equation model, generates a continuously changing detection threshold through adaptive step-size numerical integration, and updates the model parameters online using the adjoint sensitivity method based on alarm verification results, forming a closed-loop mechanism of detection, feedback, and optimization. This allows the threshold to be adjusted in real time according to the degree of anomaly and the operational environment, continuously reducing the false alarm rate and improving timeliness. Attached Figure Description
[0015] Figure 1 This is a schematic diagram of the method flow of the present invention. Detailed Implementation
[0016] The technical solution of the present invention will be clearly and completely described below with reference to the embodiments. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments.
[0017] Please see Figure 1 This invention is a dynamic detection method for abnormal network security operation and maintenance behavior based on artificial intelligence, comprising the following steps: S1: Obtain operation logs from the network security operation and maintenance audit system, and simultaneously extract target device characteristics and current network security status information to construct a heterogeneous time-series behavior graph; where the nodes in the heterogeneous time-series behavior graph include user nodes, target device nodes, command nodes and session nodes, and the edges represent the operation relationships recorded in the operation logs; S2: Encode each operation in the operation log using a preset temporal heterogeneous graph attention network to generate an embedding vector; group the operation operations into operation sessions according to the operation time order, and arrange the embedding vectors corresponding to the operation operations within the same operation session according to the operation time order to form the embedding vector sequence of the operation session; perform manifold parameterization on the embedding vector sequence and apply continuity constraints for correction to obtain the continuous behavior curve of the operation session with second-order continuous differentiability in high-dimensional space; S3: Based on a preset sliding time window, calculate the Flyner frame for the continuous behavior curve segment within the current sliding time window, extract the instantaneous rate of curvature change and instantaneous deflection, calculate the information entropy based on the embedding vector within the sliding time window, and obtain the dynamic baseline entropy based on the embedding vector statistics of historical operation and maintenance sessions of the same role in the same period. S4: Take the positive part of the product of the instantaneous rate of curvature change and the instantaneous torsion to obtain the twisting coordination quantity, and use the absolute value of the difference between the information entropy and the dynamic baseline entropy as the exponential decay factor; multiply the twisting coordination quantity and the exponential decay factor and integrate them within the sliding time window to obtain the dynamic anomaly index at the current moment. S5: Input the dynamic anomaly index and the environmental context vector composed of the target device characteristics and the network security situation information into a preset neural ordinary differential equation model to obtain the time-varying derivative of the detection threshold, and generate a continuously changing detection threshold through numerical integration; compare the dynamic anomaly index with the detection threshold in real time, generate an anomaly alarm when the dynamic anomaly index exceeds the detection threshold, obtain the verification result of the anomaly alarm from the preset alarm verification database, convert the verification result into a parameter adjustment signal, and update the parameters of the neural ordinary differential equation model through the adjoint sensitivity method.
[0018] Specifically, the system obtains operation logs from the network security operation and maintenance audit system, and simultaneously extracts two types of environmental information: first, it obtains characteristics of the target device such as device type, operating system, importance level, and internet exposure identifier through the asset management interface; second, it obtains network security situation information such as the overall network attack alarm level, the number of abnormal logins in the same network segment, the scanning and detection frequency, and the threat intelligence matching level through the situational awareness platform API. The system parses the user identifier, target device identifier, command string, session identifier, and operation timestamp in the operation logs. After standardizing and deduplicating the command string, it dynamically creates user nodes, target device nodes, command nodes, and session nodes, and establishes typed directed edges carrying timestamps between nodes to complete the construction of a heterogeneous temporal behavior graph. A pre-trained temporal heterogeneous graph attention network is used to perform multi-layer attention encoding on the command node and its neighboring nodes corresponding to each operation and maintenance operation, generating embedding vectors. The embedding vectors within the same operation and maintenance session are arranged in chronological order of operation time to form an embedding vector sequence. Using each embedding vector as a sampling point, a cubic Hermite piecewise interpolation polynomial is constructed, and a continuity constraint is applied to correct the tangent vector, ultimately obtaining a continuous behavior curve with second-order continuous differentiability. A preset sliding time window is used to slide on the continuous behavior curve, and the curve segments within the window are resampled at equal intervals. The first to third derivative vectors of each sampling point are calculated, and a Flyner frame is generated through Gram-Schmidt orthogonalization to extract the instantaneous rate of curvature change and instantaneous torsion. At the same time, the corresponding subset of embedding vectors within the window is truncated, and the information entropy is calculated through kernel density estimation. The dynamic baseline entropy is obtained from historical sessions of the same role in the same period as a reference. The positive part of the product of instantaneous rate of curvature change and instantaneous torsion is taken as the torsion coordination quantity. An exponential decay factor is constructed based on the degree of deviation of information entropy from dynamic baseline entropy. The two are multiplied and integrated within a sliding time window to obtain the dynamic anomaly index at the current moment. The dynamic anomaly index is concatenated with an environmental context vector composed of target device characteristics and network security situation information, and input into a neural network constant differential equation model. The time-varying derivative of the detection threshold is output and continuously changing detection threshold is generated through adaptive step-size numerical integration. The dynamic anomaly index is compared with the threshold in real time. An alarm is generated when the threshold is exceeded. The manual verification results obtained from the alarm verification database are converted into parameter adjustment signals. The model parameters are updated in reverse through the adjoint sensitivity method, forming a closed loop of detection, verification, and optimization.
[0019] In one embodiment of the present invention, step S1 includes the following steps: Whenever an operation log is retrieved from the network security operation and maintenance audit system, the characteristics of the target device involved in the operation log and the current network security situation information are extracted synchronously, and the user identifier, target device identifier, command string, session identifier and operation timestamp in the operation log are parsed online. The command strings are standardized and deduplicated so that multiple command strings with the same semantics are mapped to the same command node; Based on the analysis results of the operation and maintenance logs, user nodes, target device nodes, command nodes, and session nodes that do not currently exist are dynamically created in the heterogeneous time-series behavior graph. At the same time, typed directed time-series edges are established. The typed directed time-series edges include execution edges from user nodes to command nodes, operation edges from command nodes to target device nodes, ownership edges from command nodes to session nodes, and initiation edges from user nodes to session nodes. Each edge carries an operation timestamp.
[0020] Specifically, the network security operation and maintenance audit system is a unified platform that aggregates data collected by various auditing components such as bastion hosts, host audit agents, and database audit gateways, and outputs operation logs in a structured format. Each operation log contains at least a user identifier, target device identifier, command string, session identifier, and operation timestamp. The command string records the actual commands entered by the operation personnel or the names of application programming interface calls initiated during the operation; the session identifier uniquely identifies the session to which an operation connection belongs from its establishment to its disconnection. Simultaneously, environmental information is extracted from each operation log; the system's asset management interface is used to obtain characteristics such as the target device type, operating system type (e.g., Linux), importance level, and internet exposure identifier; and the situational awareness platform API is used to obtain network security situational information such as the current network-wide attack alert level, the number of abnormal logins within the same network segment, the frequency of scanning and probing within the same network segment, and the threat intelligence matching level.
[0021] In different operation and maintenance environments, commands with the same function may appear in various literal forms due to differences in parameter order, option syntax, and the number of spaces. To aggregate semantically identical commands to the same command node, standardization and deduplication processing of the command strings is required. First, the command name is extracted by taking the part before the first space or special character from the command string and removing the command line parameters and options. Then, the command name is normalized by converting the extracted command names to lowercase and removing leading and trailing whitespace characters. Finally, deduplication mapping is performed; a global mapping table from normalized command names to command nodes is maintained; when processing each operation and maintenance log, the mapping table is queried using the normalized command name; if a command node with the corresponding normalized command name already exists in the mapping table, it is reused directly; if no corresponding entry exists in the mapping table, a new command node is created and the mapping relationship is written to the mapping table.
[0022] The heterogeneous time-series behavior graph contains four types of nodes: user nodes, target device nodes, command nodes, and session nodes. Each type of node is uniquely identified by a corresponding field value. For the user identifier, target device identifier, normalized command name, and session identifier obtained from parsing the current operation log, the heterogeneous time-series behavior graph is checked sequentially to see if a corresponding node already exists. If a user node with a corresponding user identifier does not exist, it is created, and the node identifier is the user identifier; if a target device node with a corresponding target device identifier does not exist, it is created, and the node identifier is the target device identifier; if a command node with a corresponding normalized command name does not exist, it is created, and the original command string sample set for the command node is recorded; if a session node with a corresponding session identifier does not exist, it is created, and the node identifier is the session identifier. If a corresponding node already exists, its creation is skipped.
[0023] After ensuring the existence of four types of nodes in the heterogeneous temporal behavior graph, four categorized directed temporal edges are established based on the operation relationships recorded in the maintenance operation log. Execution edges point from user nodes to command nodes, indicating that the user executed the command. Operation edges point from command nodes to target device nodes, indicating that the command applied to the target device. Belonging edges point from command nodes to session nodes, indicating that the command operation occurred within that session. Initiating edges point from user nodes to session nodes, indicating that the user initiated the session; if an operation log has already been processed in the same session, the initiating edge already exists. Before establishing a new edge, it is checked whether an initiating edge exists in the same direction; if so, it is skipped. Each directed temporal edge carries the operation timestamp from the current maintenance operation log when it is established.
[0024] In one embodiment of the present invention, in step S2, each operation in the operation log is encoded using a preset temporal heterogeneous graph attention network to generate an embedding vector, including the following steps: For each edge in the heterogeneous temporal behavior graph, an edge attribute vector is constructed. The edge attribute vector is formed by concatenating the one-hot encoding of the edge type and the Fourier encoding of the operation timestamp. In each layer of the pre-defined temporal heterogeneous graph attention network, the command node corresponding to the operation and maintenance is taken as the central node, and the node directly connected to the central node is taken as the neighbor node. The query vector and key vector of the central node and the neighbor node are obtained by linear projection of the node type, respectively, and then the dot product operation is performed. The edge attribute vector of the connecting edge is linearly projected and added to the dot product result as a bias term. After normalization, the attention weight is obtained. Then, the value vector of the neighbor node is weighted and aggregated using the attention weight. After residual connection and layer normalization, the result of this layer is output. After stacking multiple layers, the embedding vector is obtained.
[0025] Specifically, there are four types of edges: execution edges, operation edges, belonging edges, and initiating edges, each corresponding to a 4-dimensional one-hot vector. If the edge is an execution edge, its one-hot encoding is... If it is an operational edge, the one-hot encoding is: If it is a belonging edge, the one-hot encoding is: If it is the initiating edge, the one-hot encoding is: Pre-select a set of cycles that cover common operational and maintenance work patterns. For example, if k=4, then the periods are as follows: =3600 seconds (corresponding to a 1-hour cycle) =86400 seconds (corresponding to a 1-day cycle) =604800 seconds (corresponding to a 1-week cycle) =2,592,000 seconds (corresponding to a 30-day cycle); angular frequency is defined based on each cycle. Where j=1,2,3,4; for the edge operation timestamp t (timestamp in seconds), its Fourier encoded vector The 2j−1 dimension is calculated as The 2j-th dimension is calculated as The 4-dimensional one-hot vector is concatenated with the 8-dimensional Fourier-coded vector to form a 12-dimensional edge attribute vector. , where u and v represent the source node and target node of this directed edge, respectively.
[0026] For each type of node in the heterogeneous temporal behavior graph, a learnable embedding matrix is established: user embedding matrix, target device embedding matrix, command embedding matrix, and session embedding matrix. All elements of the embedding matrices are initialized using a Xavier uniform distribution, ensuring that the variance of each dimension of the initial embedding vector remains at 1 / 128 and the bias is initialized to zero. When a new node is dynamically created in the graph based on the operation logs, the unique identifier of that node is used as an index to extract the corresponding 128-dimensional vector from the relevant embedding matrix as the initial node feature. All elements of the embedding matrix are updated collaboratively during the overall training process.
[0027] The temporal heterogeneous graph attention network consists of L stacked layers, typically L=2. For each operational operation log, the corresponding command node is used... Using the central node, select the edges generated through this log and... Directly connected nodes form the set of neighbor nodes. Since a single log entry creates an execution edge, an operation edge, and a home edge simultaneously, the neighbor node set... It is fixed to contain three nodes, namely user nodes. Target device node and session nodes .
[0028] Let the nodes in the (l-1)th layer be represented as command nodes. User nodes Equipment Nodes and session nodes When l=1, it is the initial feature. The calculation process for the l-th layer is as follows: First, a query projection matrix is set for the command type. The query vector of the command node in the computing center For the three types of neighbor nodes, a key projection matrix is set respectively. Sum projection matrix ;in and These represent the key and value dimensions, respectively, with a typical value of 64 for both, which is half the embedding dimension. All projection matrices are learnable parameters and are initialized uniformly using Xavier. For each neighbor node u, a corresponding matrix is selected based on its type to obtain the key vector. Sum value vector .
[0029] Secondly, the edge attribute vector of the directed edge connecting the center node and its neighboring nodes. The linear projection transforms the bias into a scalar bias. A shared, learnable edge attribute projection vector is then set. Initially, all elements of this vector are set to 0.01 to minimize the initial contribution of the bias term. Scalar bias: For execution edges between command nodes and user nodes, the 12-dimensional attribute vector of that execution edge is used; for operation edges from a command node to a target device, the attribute vector of the operation edge is used; for home edges from a command node to a session node, the attribute vector of the home edge is used.
[0030] Finally, the attention score for each neighbor node u is calculated. The scores of the three neighboring nodes are normalized using the softmax function to obtain the attention weights. By using attention weights to perform a weighted summation of the value vectors, aggregated features are synthesized. Then, residual connections and layer normalization are applied to the aggregated features to obtain the final representation of the command node in layer l. Layer normalization calculates the mean and variance along the feature dimension and normalizes them. It contains learnable scaling and offset parameters, which are initialized to 1 and 0, respectively.
[0031] During the execution of the multi-layer network, neighboring nodes also update their representations through a similar attention aggregation process. Specifically, user nodes aggregate information about other command nodes they have executed, target device nodes aggregate information about command nodes they have manipulated, and session nodes aggregate information about the command nodes they contain. When stacking L=2 layers, the final representation output of the four nodes in the first layer serves as the input to the second layer, where projection, attention calculation, aggregation, and normalization operations are performed again. Ultimately, the command node representation output by the second layer... This is the 128-dimensional embedding vector corresponding to this operation and maintenance log.
[0032] As operation logs continuously flow in, the degree of user nodes and target device nodes in the heterogeneous temporal behavior graph will increase over time. To control the complexity of a single attention calculation, a time-based nearest neighbor sampling strategy is adopted for attention calculations centered on either user nodes or target device nodes: if the actual number of neighbors exceeds a preset limit of 64, the 64 nearest neighbors are selected in descending order of operation timestamps to participate in the calculation; otherwise, all neighbors participate. The temporal information contained in the Fourier encoding of the edge attribute vectors allows the network to automatically learn to assign higher weights to more recent operations during the attention calculation process. This strategy constrains the time complexity of a single attention calculation to a constant level.
[0033] In one embodiment of the present invention, in step S2, the embedded vector sequence is subjected to manifold parameterization and corrected by applying continuity constraints to obtain a continuous behavior curve of the operation and maintenance session with second-order continuous differentiability in a high-dimensional space, including the following steps: For the embedded vector sequence arranged in chronological order within the operation and maintenance session, each embedded vector in the sequence is taken as a sampling point, and the tangent vector at each sampling point is calculated by the position difference between adjacent sampling points; Between adjacent sampling points, a cubic Hermite piecewise interpolation polynomial is constructed using the position of the sampling points and the tangent vector, so that adjacent polynomial segments satisfy positional continuity and first derivative continuity at the splicing point. By applying a second-order continuity constraint to the tangent vectors of all sampling points along the entire segment, and correcting the tangent vectors at each point by solving the three bending moment equations, a continuous behavior curve with second-order continuous differentiability is obtained.
[0034] Specifically, all embedded vectors within the same operation and maintenance session are sorted in ascending order by operation timestamp to obtain an embedded vector sequence. ,in Corresponding time point And satisfy N+1 represents the total number of operations included in the operation and maintenance session; in the subsequent manifold parameterization process, each embedding vector This is considered as a discrete sampling point in a high-dimensional space. The original operation timestamp is linearly normalized to the [0,1] interval and used as the curve parameter. ,in Setting the parameter to 0 will start the sequence from 0. This yields a sequence of sampling points arranged according to the normalized time parameter. , i=0,…,N.
[0035] For each sampling point The tangent vector at a point is calculated by the positional difference between adjacent sampling points. For internal points i=1,…,N−1, a central difference scheme is used to reduce boundary effects, and the tangent vector… For the two endpoints (start and end), a one-sided difference is used, and the tangent vector is... .
[0036] Between every two adjacent sampling points, a cubic Hermite interpolation polynomial is constructed, constrained by the position vectors and tangent vectors at both ends. Let the parameter interval corresponding to the current segment be... Let the interval length And introduce local normalization parameters. ,but It takes values within the range [0,1]. The piecewise curve expression between these adjacent sampling points is: ;in, , , , . Indicates adjacent sampling points and The i-th cubic Hermite interpolation polynomial constructed between; when As the curve continuously changes from 0 to 1, it plots a smooth curve connecting these two points; by splicing together all segments, the entire continuous behavior curve C(s) is obtained. This constructed curve naturally satisfies the condition that the positions at the endpoints are... and The first derivative is and Thus, all segments are continuous in position and first derivative at the splicing point, meaning the entire curve has C¹ smoothness.
[0037] C¹ continuity is insufficient to guarantee the continuity of differential geometric quantities such as curvature at sampling points; the smoothness of the curve needs to be further improved to C² continuity. To this end, the tangent vector at the internal sampling points is modified, and the modified tangent vector is denoted as... (i=0,…,N), and requires that the second derivatives of the left and right segments at each internal splicing point be equal. Based on the original Hermite interpolation formula, this condition is re-derived and is equivalent to the three-moment equation system. Where i = 1, 2, ..., N−1. The first and last tangent vectors maintain their initial difference values as boundary conditions, i.e., let = , = This system of equations is essentially 128 independent tridiagonal linear systems (one for each embedding dimension); the entire set of corrected tangent vectors can be efficiently solved using the chasing method. Use the corrected tangent vector Replace the original Substituting the piecewise cubic Hermite interpolation formula back into the equation, we obtain the final continuous behavior curve with second-order continuous differentiability.
[0038] In one embodiment of the present invention, step S3 includes the following steps: The continuous behavior curve segments within a preset sliding time window are resampled at equal intervals according to normalized time parameters to obtain multiple discrete sampling points. The first-order derivative vector and the second-order derivative vector at each sampling point are calculated by numerical difference. The direction of the first-order derivative vector is used as the unit tangent vector. The second-order derivative vector is orthogonalized and normalized with respect to the unit tangent vector to obtain the second orthogonal vector. The third-order derivative vector is orthogonalized and normalized with respect to the unit tangent vector and the second orthogonal vector to obtain the third orthogonal vector. Thus, a sequence of mutually orthogonal FRENERAL frame vectors in high-dimensional space is generated. The magnitude of the derivative vector of the unit tangent vector with respect to the arc length parameter is calculated as the curvature. The instantaneous rate of change of curvature is obtained by taking the derivative of the curvature with respect to the arc length parameter. The inner product of the derivative of the second vector with respect to the arc length parameter and the third vector in the Flyner frame vector sequence is taken as the instantaneous torsion. Extract a subset of embedded vectors that are aligned with the time segment of the continuous behavior curve within the current sliding time window, and use kernel density estimation to obtain the probability density function of the embedded vector subset; then calculate the differential entropy based on the probability density function as the information entropy; Based on the current user's operation and maintenance role and the current time period, extract the historical information entropy generated by the same operation and maintenance role in the same time period from the historical operation and maintenance session information entropy storage, calculate the exponential moving average of the historical information entropy, and obtain the dynamic baseline entropy.
[0039] Specifically, the preset sliding time window length is The corresponding window width in the normalized time parameter domain This refers to the ratio conversion between the actual duration and the normalized scale. The left boundary of the current sliding window is set as the normalization parameter. The right boundary is On the continuous behavior curve within this sliding time window, pressing... Equal-interval resampling is performed, with the number of resampling points set to M+1, where M is typically 32, to ensure sufficient geometric feature resolution while controlling computational load. Each resampling point is labeled with an index m (m=0,1,…,M), and the corresponding normalized parameter sequence is as follows: ,in Using the piecewise expression of continuous behavior curves. Calculate the 128-dimensional position vector at each resampling point. .
[0040] For a sequence of equally spaced resampled points, the first derivative vector at each point is calculated using the central difference scheme. and second derivative vector For interior points m=1,…,M−1, the difference calculation formula is: , ,,in This is the resampling step size. For boundary points m=0 and m=M, a one-sided second-order difference is used to ensure numerical stability, i.e. , , , .
[0041] In 128-dimensional space, using Direction definition unit tangent vector Then, Gram-Schmidt orthogonalization is performed sequentially on the unit tangent vector and its derivative vectors to generate mutually orthogonal Fryner frame vector sequences. Specifically, first... and Orthogonalize to obtain the second orthogonal vector. Normalization yields Since the Flyner frame in high-dimensional space contains multiple normal vectors, in the high-dimensional case, only the first three orthogonal vectors need to be retained to extract curvature and torsion. This is because the instantaneous torsional characteristics of the operation and maintenance behavior curve can be obtained from... , and the third orthogonal vector Characterization. For deterministic generation. The third derivative vector at this point is calculated using a five-point central difference scheme. (Boundary points use the corresponding one-sided five-point difference); In turn and Perform Gram-Schmidt orthogonalization and normalization to obtain the third orthogonal vector. .
[0042] curvature Defined as the magnitude of the derivative vector of the unit tangent vector with respect to the arc length parameter; under discrete resampling, the arc length parameter can be approximated by the cumulative chord length of the curve segment. Because the resampling intervals are equal and the curve C² is smooth, approximating the arc length with the parameter s has first-order accuracy. Therefore, it can be directly calculated using the first-order derivative vector and the second-order derivative vector. Instantaneous rate of change of curvature This is the derivative of curvature with respect to the arc length parameter s. Instantaneous torsion. Defined as the inner product of the derivative of the second vector with respect to the arc length parameter and the third vector in the Fryner frame vector sequence; when only... and In the case of torsion, the torsion can be obtained by the triple product of the derivatives of the curvature vectors; instantaneous torsion in high-dimensional space ,in Approximated by difference.
[0043] From the current sliding time window, extract a subset of embedded vectors aligned with the time segment of the continuous behavior curve; this subset contains the normalized time parameters of all embedded vectors. Falling Those vectors within the range. Let the subset contain K embedding vectors, denoted as K. ,…, The probability density function of this embedded vector subset is obtained using the kernel density estimation method. A Gaussian kernel is chosen as the kernel function, and the bandwidth b is selected according to the Silverman criterion. ,in Let be the mean of the standard deviations of the K embedding vectors in each dimension. =128. The probability density function at any point X is... Based on this density function, the information entropy H (differential entropy) is approximated using Monte Carlo integration. Q points are randomly sampled within the region covered by the embedded vector subset, where Q is 200, and the calculation is performed... ,in This represents the composite point sampled from a multivariate normal distribution parameterized by the covariance structure of the subset data. This value is the information entropy within the current sliding time window. .
[0044] Based on the current user's operational role and the current time period, historical entropy sequences generated by the same operational role within the same time period are extracted from the historical operational session entropy storage. Operational roles are categorized according to organizational function, such as database administrator or system administrator. Time periods are divided by hour, with periods less than one hour grouped by the actual span. The historical entropy sequences are derived from the entropy records calculated for all session windows of the same role within the same time period over the past W days. W is taken as 14 days by default to cover two-week periodic behavior patterns. The extracted historical entropy sequences are arranged in ascending order of time. Calculate the exponential moving average as the dynamic baseline entropy at the current time t. Smoothing factor =0.1, which allows the baseline entropy to slowly adapt to the gradual changes in operational behavior patterns.
[0045] In one embodiment of the present invention, in S4, the dynamic anomaly index at the current time t The calculation formula is: in, The length of the sliding time window. for The instantaneous rate of change of curvature at time t, for The instantaneous torsion at time t, H( represents the positive part operation) )for The information entropy at time t, for The dynamic baseline entropy at time t, It is the preset minimum positive number.
[0046] Specifically, symbols This represents the positive part operation, which is defined as taking the positive part of any real number x. The positive part here is retained only when the product of the instantaneous rate of curvature change and the instantaneous torsion is positive. A positive product means that the direction of curvature change and the direction of torsion have the same sign. In differential geometry, this corresponds to the cooperative behavior of the curve accelerating to one side, usually representing an aggravated degree of abnormal deflection in the operation. For example, when a high-risk operation command (such as the wget command to download and execute a malicious script) is suddenly inserted into a normal sequence of viewing commands in an operations session, the corresponding embedding vector deviates from its original smooth trajectory in the manifold space, causing a sudden increase in the instantaneous rate of curvature change and a sudden change in the sign of the torsion, thus being detected as a high value by the distortion cooperative quantity. If the product is negative, it indicates that the direction of deflection is opposite to the direction of torsion, which is a self-correction within the normal fluctuation range and is not included in the abnormal accumulation. Unit tangent vector The direction is consistent with the direction of motion of the embedding vector on the manifold, the second orthogonal vector The instantaneous bending direction of the curve, the third orthogonal vector. Generated by orthogonalization of the third derivative, its direction is related to the torsion direction of the Fleischer frame; under this convention, the product... The sign of the value is uniquely determined; a positive value represents the synergistic deflection of increased bending and Flexner frame torsion, while a negative value represents their mutual cancellation. The sign does not flip due to the arbitrariness of the Flexner frame's orientation. Minimal positive number. Used to prevent numerical anomalies caused by a denominator of zero, its value is no greater than [value missing]. For example, taking values .
[0047] In one embodiment of the present invention, in step S5, the time-varying derivative of the detection threshold is obtained by inputting the dynamic anomaly index and the environmental context vector composed of the target device characteristics and the network security situation information into a preset neural network differential equation model, and a continuously changing detection threshold is generated by numerical integration, including the following steps: Before the first detection, the statistical quantile of the dynamic anomaly index collected during historical attack-free periods is calculated, and this statistical quantile is used as the initial detection threshold. At and after the first detection, the initial detection threshold is used as the initial value. The detection threshold is modeled as a continuous-time dynamic system state. Through the fully connected neural network in the neural ordinary differential equation model, the concatenated vector of the current detection threshold, dynamic anomaly index and environmental context vector is used as input to output the time-varying derivative of the detection threshold. This makes the rate of change of the detection threshold determined by the three factors of the current threshold level, the degree of anomaly and the operation and maintenance environment. An adaptive step-size numerical integrator is used to generate a detection threshold that changes continuously with time by numerically integrating the time-varying derivative from the initial detection threshold. The adaptive step-size numerical integrator automatically adjusts the integration step size according to the change of the time-varying derivative.
[0048] Specifically, before the initial detection, an initial detection threshold is determined, enabling the detection engine to correctly compare the anomaly index with the threshold from the moment of startup. The initial detection threshold is derived from statistical analysis of dynamic anomaly index samples collected during historical attack-free periods. These historical attack-free periods are defined as the initial continuous runtime segments confirmed by security audits to contain no attack behavior, typically 7 days in length. During this period, the dynamic anomaly index for each window is calculated at the same sliding window frequency as in formal detection, and the anomaly index values from all windows are aggregated into a sample set. Calculate the 95th percentile of this set; the resulting value is the initial detection threshold. The 95th percentile is chosen instead of the maximum value because it can exclude abnormal spikes caused by sudden business peaks or instantaneous system fluctuations during historical attack-free periods, ensuring that the initial threshold robustly represents the upper bound of normal behavior fluctuations. If the number of window samples accumulated during historical attack-free periods is less than 100, the maximum value among all samples is temporarily used as the initial threshold, and the method automatically switches back to the 95th percentile once sufficient samples are collected.
[0049] The environment context vector c(t) is a fixed-dimensional real vector with 16 dimensions. The first eight dimensions represent target device characteristics, specifically including: one-hot encoding of the target device type (5 dimensions); target device importance level (1 dimension); target device operating system type (1 dimension); and whether the target device is exposed to the internet (1 dimension). The latter eight dimensions represent network security situation information, specifically including: current network-wide attack alert level (2 dimensions); current operation and maintenance session time period (2 dimensions); number of abnormal logins on the same network segment in the past hour (1 dimension); frequency of scanning and detection on the same network segment in the past hour (1 dimension); threat intelligence matching level (1 dimension); and the strength of the authentication method used in the current operation and maintenance session (1 dimension). For example, the importance level of target devices is mapped to four levels: low, medium, high, and critical, with values of 0.0, 0.33, 0.67, and 1 respectively. Other continuous situational data, such as the number of abnormal logins and the frequency of scanning and detection, are normalized to the [0,1] interval using Min-Max. Discrete data, such as target device type and operating system type, are encoded using one-hot encoding. The authentication method strength is quantified as password authentication = 0, two-factor authentication = 0.5, and certificate authentication = 1. The specific value rules for each dimension of the environment context vector can be adjusted according to the network environment during actual deployment, but its dimensional structure remains fixed to ensure the stability of the input interface of the neural network differential equation model.
[0050] The dynamic anomaly index Ψ(t) is concatenated with the environmental context vector c(t), and then combined with the detection threshold Θ(t) at the current moment to form the complete input vector of the neural frequent differential equation model. The core of the frequent differential equation model is a fully connected neural network that receives 18-dimensional input and outputs a scalar value, which is the time-varying derivative of the detection threshold. The network structure employs a multilayer perceptron with two hidden layers. The first hidden layer contains... The first layer has 32 neurons, the second hidden layer also contains 32 neurons, and the activation function for both is the hyperbolic tangent function tanh. The output layer is a linear neuron without an activation function. , , ;in, , , , , , These are the network weights and bias parameters, initialized using Xavier uniform initialization, with biases initialized to zero. The output range of the hyperbolic tangent activation function is between (−1, 1), making... The fluctuation range is constrained within a reasonable range to avoid sudden threshold changes.
[0051] From the initial detection threshold Starting from the time-varying derivative along the time direction Numerical integration is performed to generate a detection threshold that changes continuously over time. The numerical integrator employs the Dormand-Prince method, which provides a fourth-order approximation for each integration step. A fifth-order approximation The absolute value of the difference between the two This is the local truncation error at that step.
[0052] The integrator at the first detection moment Initialization complete, current integration time. Set as Current detection threshold Set as Initial attempt step size Set to 1.0 second. At each step, starting from the current state, the fully connected neural network of the neural network providing the derivative value is invoked, and the result is calculated according to the Dormand-Prince coefficient table. and The local truncation error err is obtained. err is then compared with the preset tolerance tol = Compare; if err > tol, reject the result of this step, reduce the step size, and recalculate the step; if err ≤ tol, accept the result of this step and update. for + ,renew for And increase the step size for the next step; for example, the step size is based on the formula The system performs updates and scaling. When the dynamic anomaly index fluctuates drastically, the time-varying derivative of the threshold changes more significantly, and the local truncation error also increases. The integrator automatically reduces the step size to capture instantaneous changes in the detection threshold at a higher frequency. During periods of stable business operation, the anomaly index fluctuates gently, and the integrator automatically increases the step size to reduce real-time computational overhead while ensuring detection accuracy.
[0053] The integrator advances in sync with the sliding window. Whenever the window is updated and a new dynamic anomaly index and environmental context vector are generated, the integrator continues to advance based on the previously reached time and threshold, and outputs the threshold corresponding to each time. All output thresholds form a continuously changing detection threshold curve Θ(t) along the time axis.
[0054] In one embodiment of the present invention, step S5 involves obtaining the verification result of an abnormal alarm from a preset alarm verification database, converting the verification result into a parameter adjustment signal, and updating the parameters of the neural ordinary differential equation model using the adjoint sensitivity method, including the following steps: Retrieve verification results corresponding to abnormal alarms from the preset alarm verification database. Verification results include confirmation of attack and false alarms. When the verification result confirms an attack, the corresponding parameter adjustment signal is assigned a positive value of one; when the verification result is a false alarm, the corresponding parameter adjustment signal is assigned a negative value of one. Define a cumulative parameter adjustment signal function from the initial time to the current time. This function is based on the difference between the detection threshold at the alarm time and the dynamic anomaly index, which is mapped and then weighted by the parameter adjustment signal. With the goal of maximizing the cumulative parameter adjustment signal function, the parameter gradient of the fully connected neural network is calculated by the adjoint sensitivity method, and the parameters of the fully connected neural network are updated along the rising direction of the gradient.
[0055] Specifically, the alarm verification database stores all previous abnormal alarms and their corresponding manual verification conclusions. Whenever an abnormal alarm is generated, it is assigned a unique alarm identifier and written to the alarm verification database, initially marked as "pending verification." Verification conclusions are limited to two types: confirmed attack and false alarm. Each record in the alarm verification database contains at least the alarm identifier, alarm generation time, corresponding dynamic anomaly index, verification conclusion, and verification timestamp. This database provides a reliable source of monitoring signals for parameter updates. For each verified alarm record, the original generation time of the alarm is extracted. A parameter adjustment signal r is generated based on the verification conclusion; when the verification conclusion is confirmed as an attack, r = +1; when the verification conclusion is a false alarm, r = -1; during subsequent reverse integration, the parameter adjustment signal r is strictly mapped and anchored to time t. It refers to the current moment when the verification is completed manually, rather than the moment when the verification is completed manually.
[0056] Let the time interval be [ A total of [T] were generated within [T]. The w-th alarm occurs at the time specified in the original text. The corresponding parameter adjustment signal is ∈{+1,−1}. Accumulated parameter adjustment signal function ,in, The detection threshold at the time of the alarm. This refers to the dynamic anomaly index at the time of the alarm. This is a monotonically decreasing function used to map the difference between the threshold and the anomaly index to a confidence measure of the decision boundary. The inverted form of the logistic function is chosen, i.e. Where c is the scaling factor, typically taking the value c= , making In the independent variable z= The function exhibits a steep transition near 0, approximating an indicator function while maintaining differentiability; since this function only sums over triggered alarm events, the alarm timing always satisfies... ,therefore The value is close to 1. The essence of the cumulative parameter adjustment signal function J is to assign positive weights to attack alarms, contributing positive signals to encourage a lower detection threshold in this context; and to assign negative weights to false alarm alarms, penalizing a lower detection threshold in this context, thereby driving the detection threshold adjustment to distinguish between attacks and false alarms; maximizing J is equivalent to making the detection threshold decision boundary cover as many attack alarms as possible and trigger as few false alarm alarms as possible.
[0057] In the constant differential equation model, the parameters of the fully connected neural network are denoted as θ, which is the set of all projection matrices and bias vectors in claim 7. The update objective of θ is to maximize the cumulative parameter adjustment signal function J, therefore, the gradient needs to be calculated. Because J depends on each alarm time. Detection threshold And Θ(t) is derived from a fully connected neural network. The derivative of the output is obtained by integration, but the conventional chain rule has too much overhead; the adjoint sensitivity method transforms the gradient calculation into a single inverse adjoint integral.
[0058] Define the companion variable Since Θ(t) is a scalar, Both are scalars. The adjoint variable evolves in reverse from the terminal time T, and the terminal condition is... =0. During a continuous period without alarms, satisfy ,in It is a fully connected neural network The partial derivative of the output with respect to the detection threshold input is obtained by automatic differentiation. At each alarm time... place, Apply jump ,in, For parameter adjustment signals, for The derivative of . After completing all inverse integrals, the parameter gradient is obtained from Given, among which Similarly, the integral is obtained at each step by automatic differentiation; this integral is calculated along the reverse time grid, using the local parameter sensitivity at each time step. Weighted and accumulated.
[0059] Obtain the gradient Then, the network parameters are updated using the gradient ascent method. To avoid abrupt changes in threshold dynamics due to excessively large parameter update amplitudes, a small learning rate η is adopted, with a typical value of η. The parameter update rule is as follows: Updates can be triggered after each alarm verification, or performed in small batches after accumulating a certain number of verification results. To maintain detection stability, updates are only enabled when the normal operational baseline is confirmed to be free of attacks, preventing the incorrect learning of attack patterns during peak attack periods. When the accumulated parameter adjustment signal function no longer increases or the verification accuracy converges after multiple consecutive updates, parameter updates are paused to avoid overfitting.
[0060] The node embedding matrix, all parameters of the temporal heterogeneous graph attention network, and the fully connected neural network parameters θ of the neural frequent differential equation model are trained using an end-to-end joint training strategy, uniformly driven by the supervision signal from alarm verification feedback. The complete forward link of a single operation and maintenance session, from the initial node feature lookup table in S1 to the output detection threshold in S5, is the same as described in steps S1 to S5. The loss function is defined as L = −J, where J is the cumulative parameter adjustment signal function. During gradient backpropagation, the accompanying variable λ(t) propagates backward from the alarm time, passing through the detection threshold, the neural frequent differential equation model, the dynamic anomaly index, information entropy, the rate of change of curvature, torsion, the Flyner frame vector, the continuous behavior curve, and the manifold parameterization process, and is finally backpropagated to the parameters of each layer of the temporal heterogeneous graph attention network and the node embedding matrix. In the above link, Hermite interpolation, solving the three moment equations, Gram-Schmidt orthogonalization, kernel density estimation, and other operations are all differentiable operations, and their back gradients are automatically calculated through an automatic differentiation framework. The Adam optimizer is used to update all learnable parameters, and the update triggering conditions and frequency are the same as the parameter update strategy mentioned above in S5.
[0061] The above embodiments are only used to illustrate the technical methods of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical methods of the present invention without departing from the spirit and scope of the technical methods of the present invention.
Claims
1. A dynamic detection method for abnormal network security operation and maintenance behavior based on artificial intelligence, characterized in that, Includes the following steps: S1: Obtain operation logs from the network security operation and maintenance audit system, and simultaneously extract target device characteristics and current network security status information to construct a heterogeneous time-series behavior graph; where the nodes in the heterogeneous time-series behavior graph include user nodes, target device nodes, command nodes and session nodes, and the edges represent the operation relationships recorded in the operation logs; S2: Encode each operation in the operation log using a preset temporal heterogeneous graph attention network to generate an embedding vector; group the operation operations into operation sessions according to the operation time order, and arrange the embedding vectors corresponding to the operation operations within the same operation session according to the operation time order to form the embedding vector sequence of the operation session; perform manifold parameterization on the embedding vector sequence and apply continuity constraints for correction to obtain the continuous behavior curve of the operation session with second-order continuous differentiability in high-dimensional space; S3: Based on a preset sliding time window, calculate the Flyner frame for the continuous behavior curve segment within the current sliding time window, extract the instantaneous rate of curvature change and instantaneous deflection, calculate the information entropy based on the embedding vector within the sliding time window, and obtain the dynamic baseline entropy based on the embedding vector statistics of historical operation and maintenance sessions of the same role in the same period. S4: Take the positive part of the product of the instantaneous rate of curvature change and the instantaneous torsion to obtain the twisting coordination quantity, and use the absolute value of the difference between the information entropy and the dynamic baseline entropy as the exponential decay factor; multiply the twisting coordination quantity and the exponential decay factor and integrate them within the sliding time window to obtain the dynamic anomaly index at the current moment. S5: Input the dynamic anomaly index and the environmental context vector composed of the target device characteristics and the network security situation information into a preset neural ordinary differential equation model to obtain the time-varying derivative of the detection threshold, and generate a continuously changing detection threshold through numerical integration. The dynamic anomaly index is compared with the detection threshold in real time. When the dynamic anomaly index exceeds the detection threshold, an anomaly alarm is generated. The verification result of the anomaly alarm is obtained from the preset alarm verification database. The verification result is converted into a parameter adjustment signal and the parameters of the neural ordinary differential equation model are updated by the adjoint sensitivity method.
2. The method for dynamic detection of abnormal network security operation and maintenance behavior based on artificial intelligence according to claim 1, characterized in that, S1 includes the following steps: Whenever an operation log is retrieved from the network security operation and maintenance audit system, the characteristics of the target device involved in the operation log and the current network security situation information are extracted synchronously, and the user identifier, target device identifier, command string, session identifier and operation timestamp in the operation log are parsed online. The command strings are standardized and deduplicated so that multiple command strings with the same semantics are mapped to the same command node; Based on the analysis results of the operation and maintenance logs, user nodes, target device nodes, command nodes, and session nodes that do not currently exist are dynamically created in the heterogeneous time-series behavior graph. At the same time, typed directed time-series edges are established. The typed directed time-series edges include execution edges from user nodes to command nodes, operation edges from command nodes to target device nodes, ownership edges from command nodes to session nodes, and initiation edges from user nodes to session nodes. Each edge carries an operation timestamp.
3. The method for dynamic detection of abnormal network security operation and maintenance behavior based on artificial intelligence according to claim 1, characterized in that, In step S2, each operation in the operation log is encoded using a preset temporal heterogeneous graph attention network to generate an embedding vector, including the following steps: For each edge in the heterogeneous temporal behavior graph, an edge attribute vector is constructed. The edge attribute vector is formed by concatenating the one-hot encoding of the edge type and the Fourier encoding of the operation timestamp. In each layer of the pre-defined temporal heterogeneous graph attention network, the command node corresponding to the operation and maintenance is taken as the central node, and the node directly connected to the central node is taken as the neighbor node. The query vector and key vector of the central node and the neighbor node are obtained by linear projection of the node type, respectively, and then the dot product operation is performed. The edge attribute vector of the connecting edge is linearly projected and added to the dot product result as a bias term. After normalization, the attention weight is obtained. Then, the value vector of the neighbor node is weighted and aggregated using the attention weight. After residual connection and layer normalization, the result of this layer is output. After stacking multiple layers, the embedding vector is obtained.
4. The method for dynamic detection of abnormal network security operation and maintenance behavior based on artificial intelligence according to claim 3, characterized in that, In step S2, the embedded vector sequence is subjected to manifold parameterization and corrected by applying continuity constraints to obtain the continuous behavior curve of the operation and maintenance session with second-order continuous differentiability in high-dimensional space, including the following steps: For the embedded vector sequence arranged in chronological order within the operation and maintenance session, each embedded vector in the sequence is taken as a sampling point, and the tangent vector at each sampling point is calculated by the position difference between adjacent sampling points; Between adjacent sampling points, a cubic Hermite piecewise interpolation polynomial is constructed using the position of the sampling points and the tangent vector, so that adjacent polynomial segments satisfy positional continuity and first derivative continuity at the splicing point. By applying a second-order continuity constraint to the tangent vectors of all sampling points along the entire segment, and correcting the tangent vectors at each point by solving the three bending moment equations, a continuous behavior curve with second-order continuous differentiability is obtained.
5. The method for dynamic detection of abnormal network security operation and maintenance behavior based on artificial intelligence according to claim 1, characterized in that, S3 includes the following steps: The continuous behavior curve segments within a preset sliding time window are resampled at equal intervals according to normalized time parameters to obtain multiple discrete sampling points. The first-order derivative vector and the second-order derivative vector at each sampling point are calculated by numerical difference. The direction of the first-order derivative vector is used as the unit tangent vector. The second-order derivative vector is orthogonalized and normalized with respect to the unit tangent vector to obtain the second orthogonal vector. The third-order derivative vector is orthogonalized and normalized with respect to the unit tangent vector and the second orthogonal vector to obtain the third orthogonal vector. Thus, a sequence of mutually orthogonal FRENERAL frame vectors in high-dimensional space is generated. The magnitude of the derivative vector of the unit tangent vector with respect to the arc length parameter is calculated as the curvature. The instantaneous rate of change of curvature is obtained by taking the derivative of the curvature with respect to the arc length parameter. The inner product of the derivative of the second vector with respect to the arc length parameter and the third vector in the Flyner frame vector sequence is taken as the instantaneous torsion. Extract a subset of embedded vectors that are aligned with the time segment of the continuous behavior curve within the current sliding time window, and use kernel density estimation to obtain the probability density function of the embedded vector subset; then calculate the differential entropy based on the probability density function as the information entropy; Based on the current user's operation and maintenance role and the current time period, extract the historical information entropy generated by the same operation and maintenance role in the same time period from the historical operation and maintenance session information entropy storage, calculate the exponential moving average of the historical information entropy, and obtain the dynamic baseline entropy.
6. The method for dynamic detection of abnormal network security operation and maintenance behavior based on artificial intelligence according to claim 1, characterized in that, In S4, the dynamic anomaly index at the current time t The calculation formula is: in, The length of the sliding time window. for The instantaneous rate of change of curvature at time t, for The instantaneous torsion at time t, H( represents the positive part operation) )for The information entropy at time [time], for The dynamic baseline entropy at time t, It is the preset minimum positive number.
7. The method for dynamic detection of abnormal network security operation and maintenance behavior based on artificial intelligence according to claim 1, characterized in that, In step S5, the dynamic anomaly index and the environmental context vector composed of the target device characteristics and the network security situation information are input into a preset neural network differential equation model to obtain the time-varying derivative of the detection threshold, and a continuously changing detection threshold is generated through numerical integration, including the following steps: Before the first detection, the statistical quantile of the dynamic anomaly index collected during historical attack-free periods is calculated, and this statistical quantile is used as the initial detection threshold. At and after the first detection, the initial detection threshold is used as the initial value. The detection threshold is modeled as a continuous-time dynamic system state. Through the fully connected neural network in the neural ordinary differential equation model, the concatenated vector of the current detection threshold, dynamic anomaly index and environmental context vector is used as input to output the time-varying derivative of the detection threshold. This makes the rate of change of the detection threshold determined by the three factors of the current threshold level, the degree of anomaly and the operation and maintenance environment. An adaptive step-size numerical integrator is used to generate a detection threshold that changes continuously with time by numerically integrating the time-varying derivative from the initial detection threshold. The adaptive step-size numerical integrator automatically adjusts the integration step size according to the change of the time-varying derivative.
8. The method for dynamic detection of abnormal network security operation and maintenance behavior based on artificial intelligence according to claim 7, characterized in that, In step S5, the verification results of abnormal alarms are obtained from a preset alarm verification database, the verification results are converted into parameter adjustment signals, and the parameters of the neural ordinary differential equation model are updated using the adjoint sensitivity method. This includes the following steps: Retrieve verification results corresponding to abnormal alarms from the preset alarm verification database. Verification results include confirmation of attack and false alarms. When the verification result confirms an attack, the corresponding parameter adjustment signal is assigned a positive value of one; when the verification result is a false alarm, the corresponding parameter adjustment signal is assigned a negative value of one. Define a cumulative parameter adjustment signal function from the initial time to the current time. This function is based on the difference between the detection threshold at the alarm time and the dynamic anomaly index, which is mapped and then weighted by the parameter adjustment signal. With the goal of maximizing the cumulative parameter adjustment signal function, the parameter gradient of the fully connected neural network is calculated by the adjoint sensitivity method, and the parameters of the fully connected neural network are updated along the rising direction of the gradient.