A method and system for detecting graphical user interface operation behavior

By collecting and fusing interface control attributes and screen image data, and using liquid neural networks for continuous-time modeling, the problem of insufficient characterization of user operation process and lag in anomaly detection in existing technologies is solved, and accurate identification and real-time response to violations are achieved.

CN122196827APending Publication Date: 2026-06-12STATE GRID ANHUI ELECTRIC POWER CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID ANHUI ELECTRIC POWER CO LTD
Filing Date
2026-03-16
Publication Date
2026-06-12

Smart Images

  • Figure CN122196827A_ABST
    Figure CN122196827A_ABST
Patent Text Reader

Abstract

The application discloses a kind of graphical user interface operation behavior detection method and system, it is related to behavior identification technical field, including the following steps: control attribute and screen image data in graphical interface are collected to user, time alignment and feature fusion are carried out, and operation trajectory is generated;Continuous time modeling is carried out to operation trajectory using liquid neural network;Based on the output of continuous time modeling, analyze operation abnormality to obtain behavior modulation parameter, and utilize the parameter to modulate the dynamic parameter of continuous time modeling;Evolutionary characteristics are extracted based on the continuous time modeling result after modulation, and whether operation is judged as violation according to evolutionary characteristics.From the method, the problems that it is difficult to depict continuous interaction dynamic characteristics in the existing Windows graphical operation violation identification technology, the adaptability to different operation rhythm is insufficient, the abnormality determination is lagged and the identification precision of violation influence range is low are solved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of behavior recognition technology, and more specifically, to a method and system for detecting graphical user interface operation behavior. Background Technology

[0002] Existing Windows graphical user interface (GUI) monitoring technologies primarily rely on rule-based or log-based post-event analysis. They determine user behavior compliance by analyzing system events, operation logs, or simple interface state changes. However, these technologies typically rely on discrete-time point data, making it difficult to depict the continuous evolution of user operations within complex graphical interfaces. Some solutions combine screenshots or control attributes for analysis, but the lack of reliable alignment and fusion mechanisms between image information and system interface data makes them susceptible to interface occlusion, dynamic changes in controls, or inconsistent data sources, leading to incomplete or unreliable operation trajectories. Existing behavior recognition methods often employ fixed-time-scale sequence models, lacking sufficient adaptability to different operation rhythms and interaction densities, and failing to accurately reflect the true dynamics of human-computer interaction.

[0003] Meanwhile, anomaly detection is usually based on static thresholds or single feature deviations, lacking the ability to jointly model the scope of the impact of operations on the system's functionality and the degree of risk. It cannot dynamically adjust the model state at the continuous time level, which can easily lead to misjudgment or missed judgment.

[0004] Therefore, there is an urgent need for a Windows graphical violation capture and recognition technology that can integrate multi-source interface semantic information, be based on continuous-time dynamics modeling, and have adaptive modulation capabilities. Summary of the Invention

[0005] To overcome the aforementioned deficiencies of the prior art, embodiments of the present invention provide a graphical user interface operation behavior detection method and system to solve the problems of existing Windows graphical operation violation identification technology, such as difficulty in depicting continuous interactive dynamic characteristics, insufficient adaptability to different operation rhythms, delayed anomaly judgment, and low accuracy in identifying the scope of violation impact.

[0006] To achieve the above objectives, the present invention provides the following technical solution: The system collects user data on control properties and screen images in the graphical interface, performs time alignment and feature fusion to generate operation trajectories, and uses a liquid neural network to model the operation trajectory in continuous time. Based on the output of the continuous time modeling, the system analyzes the degree of operation abnormality to obtain behavior modulation parameters, and uses these parameters to modulate the dynamic parameters of the continuous time modeling. Based on the modulated continuous time modeling results, the system extracts evolutionary features and determines whether the operation is illegal based on the evolutionary features.

[0007] In a preferred embodiment, the continuous-time modeling of the operation trajectory is achieved through a continuous-time model with time-scale adaptability.

[0008] In a preferred embodiment, the liquid neural network adaptively captures multi-scale temporal patterns in the operational trajectory through the dynamic characteristics of its internal state changing continuously over time.

[0009] In a preferred embodiment, the step of analyzing the operational anomaly degree based on the output of continuous-time modeling to obtain behavioral modulation parameters includes: extracting operational features from the operational record data of the continuous-time modeling output to generate an operational feature vector; calculating the matching degree between the operational feature vector and the compliant operational pattern library to obtain operational confidence; determining the set of functional components affected by the current operation based on the mapping relationship between interface controls and system functional components; and obtaining time-continuous behavioral modulation parameters through nonlinear mapping based on the set of functional components and the operational confidence.

[0010] In a preferred embodiment, the step of modulating the dynamic parameters of continuous-time modeling using the parameter includes: dynamically modulating the base time constant of the liquid neural network based on the behavior modulation parameter to obtain an adjusted base time constant, thereby changing the continuous-time response scale of the liquid neural network; and dynamically adjusting the continuous-time state update process of the liquid neural network based on the adjusted base time constant.

[0011] In a preferred embodiment, generating the operation trajectory includes: constructing control relationships based on interface control attributes; obtaining the position information and category attributes of interface elements; performing text recognition on the screen image to obtain text content; associating the text content with interface elements based on spatial inclusion relationships to form interface semantic information; fusing the control relationships with the interface semantic information and encoding them through a timing encoder to generate the operation trajectory.

[0012] In a preferred embodiment, before performing time alignment and feature fusion to generate the operation trajectory, the method further includes: associating and mapping the interface elements identified in the image with interface controls; for the mapped interface elements, comparing the text content identified in the image with the corresponding attribute text content obtained from the system interface; when the two are determined to be inconsistent, determining that the corresponding control attribute data source is untrustworthy and generating an alarm, and using historically reliable features for interpolation in subsequent feature fusion.

[0013] In a preferred embodiment, the method further includes: calculating a system stability index based on the hidden state trajectories before and after modulation using a continuous-time model; and constraining the magnitude or rate of change of the behavior modulation parameters when the stability index changes abnormally, so as to limit the state evolution of the liquid neural network within a preset stability domain.

[0014] In a preferred embodiment, the evolutionary features include at least the magnitude of change of the hidden state vector, the rate of change, or the degree of deviation of it from the baseline features established during historical compliant operation periods.

[0015] In a preferred embodiment, a graphical user interface (GUI) operation behavior detection system includes: a data acquisition module for acquiring control attributes and screen image data of the user in the GUI, performing time alignment and feature fusion to generate an operation trajectory; a continuous-time modeling module for performing continuous-time modeling of the operation trajectory using a liquid neural network; an anomaly analysis and modulation module for analyzing the anomaly degree of the operation based on the output of the continuous-time modeling to obtain behavior modulation parameters, and using these parameters to modulate the dynamic parameters of the continuous-time modeling; and a feature extraction and judgment module for extracting evolutionary features based on the modulated continuous-time modeling results, and determining whether the operation is a violation based on the evolutionary features.

[0016] The technical effects and advantages of the graphical user interface operation behavior detection method and system of the present invention are as follows: This invention collects user interface control attributes and screen image data from a Windows graphical interface, performs time alignment and feature fusion on multi-source data to generate operation trajectories, and constructs a continuous-time modeling model based on these operation trajectories to characterize the continuous-time dynamics of the user's graphical operation process. Furthermore, based on the output of the continuous-time modeling, it performs operation anomaly analysis, generates time-continuous behavior modulation parameters, and modulates the dynamic parameters of the continuous-time modeling, enabling the model to adaptively adjust the time response scale according to the degree of operation anomaly, thereby improving its ability to perceive the evolution of abnormal operations. Further, it extracts the change amplitude of the hidden state vector from the modulated continuous-time modeling results. The system utilizes evolutionary characteristics such as the rate of change and the degree of deviation from historical compliant operation benchmarks to accurately determine whether a user's graphical operation constitutes a violation. Furthermore, by establishing a mapping relationship between operation confidence and affected system functional components, it facilitates refined identification of the scope and risk level of violations. In addition, by constraining and adjusting the stability of the hidden state trajectory of the continuous-time model, it can effectively suppress interference from abnormal noise or unreliable data on model evolution. This addresses the problems of existing Windows graphical operation violation identification technologies, such as difficulty in depicting the dynamic characteristics of continuous interaction, insufficient adaptability to different operation rhythms, delayed anomaly detection, and low accuracy in identifying the scope of violation impact. Attached Figure Description

[0017] Figure 1 This is a schematic diagram of the graphical user interface operation behavior detection method provided in an embodiment of the present invention.

[0018] Figure 2This is a diagram illustrating time alignment matching.

[0019] Figure 3 This is a schematic diagram illustrating the dynamic changes of behavior modulation parameters and time constant.

[0020] Figure 4 This is a schematic diagram illustrating the similarity and threshold line between evolutionary features and the compliance pattern library.

[0021] Figure 5 This is a schematic diagram of a graphical user interface (GUI) user behavior detection system. Detailed Implementation

[0022] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. Embodiment 1, Figure 1 The present invention provides a graphical user interface operation behavior detection method, comprising the following steps: S1: Collect the control properties and screen image data of the user in the graphical interface, perform time alignment and feature fusion, and generate operation trajectory. S2, uses a liquid neural network to model the operation trajectory in continuous time; S3, based on the output of continuous-time modeling, analyzes the operational anomaly degree to obtain the behavior modulation parameter, and uses this parameter to modulate the dynamic parameters of continuous-time modeling; S4. Based on the modulated continuous-time modeling results, the evolutionary features are extracted, and the operation is determined as a violation based on the evolutionary features.

[0023] By collecting and fusing control attributes and screen image data from the Windows graphical interface, a temporally consistent and semantically complete operation trajectory can be constructed, thereby improving the accuracy and completeness of user operation behavior representation. Continuous-time modeling based on the operation trajectory can depict the dynamic evolution of user graphical operations in the continuous time domain, enhancing adaptability to different operation rhythms and interaction complexities. By performing operation anomaly analysis on the continuous-time modeling output and generating behavior modulation parameters, adaptive modulation of the model's dynamic parameters can be achieved, allowing the model state to adjust according to changes in operational risk, improving the sensitivity and stability of anomaly response. Furthermore, by extracting evolutionary features and determining violations based on the modulated model results, compliant and non-compliant operations can be effectively distinguished, reducing the false positive rate and improving the reliability and real-time performance of capturing and recognizing non-compliant behaviors in Windows graphical operations.

[0024] S1 collects the user's control properties and screen image data in the graphical interface, performs time alignment and feature fusion, and generates an operation trajectory.

[0025] In this embodiment, collecting interface control properties and screen image data includes: 1. Obtain UI control attribute data by calling the operating system's interface management interface or accessibility interface, including: This embodiment runs on a Windows operating system environment. By calling the interface management interface or auxiliary function interface provided by the operating system, it traverses and obtains the graphical interface that is currently visible or interactive to the user, thereby collecting the interface control attribute data. The interface management interface or auxiliary function interface can be the Windows UI Automation interface, the Microsoft Active Accessibility interface, or a system-level interface with equivalent functionality; through the above interfaces, the structured attribute information of the interface controls can be obtained directly from the operating system level without intruding into the internal logic of the target application. The UI control attribute data obtained through the above interface includes at least the control's unique identifier, control type, window identifier to which the control belongs, the control's position information in the screen coordinate system, the control's current available state, visible state, focus state, and interactive attribute text content related to user operation.

[0026] This step runs as an independent system service with SYSTEM level privileges, ensuring access to all user interface control information; the sampling frequency is set to 10Hz (i.e., 10 times per second), and each sampling traverses all top-level windows and their child controls in the current desktop session to build a complete control tree structure; the collected data is temporarily stored in a memory buffer in JSON format, containing metadata such as timestamps, process IDs, window handles, and control paths.

[0027] 2. Capture screen image data of the screen or a specified window area by calling the graphics device interface, including: This embodiment acquires screen image data by calling a graphics device interface, which can be the Windows Graphics Device Interface (GDI), DirectX Graphics Interface, Desktop Duplication API, or a screen capture interface with equivalent functionality. Through the graphics device interface, a screenshot can be taken of the entire display screen or the display area corresponding to the currently active window at a preset sampling frequency, thereby obtaining screen image data that changes synchronously with the user's operation behavior. The screen image data is represented in the form of a pixel matrix, containing visual presentation information of windows, controls, icons and text in the current interface, providing raw visual input for subsequent interface semantic extraction; in this embodiment, the sampling frequency is set to 10Hz (i.e., 10 times per second). This step uses the Desktop Duplication API, a high-efficiency screen capture interface provided by Windows 8 and above. It can directly access the output of the desktop compositing engine, avoiding the performance issues and visual artifacts that may be caused by traditional GDI screenshots. The captured image resolution is 1920×1080, the color format is BGRA32, and it is achieved through hardware acceleration. The image data is temporarily stored in a shared memory area in RAW format and a capture timestamp is attached.

[0028] Furthermore, to ensure the consistency of interface control attribute data and screen image data in the time dimension, this embodiment introduces a unified timestamp marking mechanism when collecting the above two types of data. Specifically, when each interface control attribute collection is completed and each screen image capture is completed, the current timestamp is obtained through the system's high-precision timing interface and appended to the corresponding data record. The timestamp is generated using a monotonically increasing system timing method to avoid time sequence problems caused by system time adjustments.

[0029] In this embodiment, the interface control attribute data and screen image data are time-aligned based on the timestamp. Specifically, within a preset time error tolerance range, the set of interface control attribute data and screen image data with the smallest timestamp difference is determined to be synchronized data at the same time point, including: In this embodiment, a high-precision timer is used to ensure the accuracy of the event stamps, and the allowable range of time error is preset to a certain value. The specific implementation method for milliseconds is as follows: a timestamp matching algorithm based on a sliding window is used; the algorithm maintains two first-in-first-out queues to store control attributes and image data respectively; for each newly arrived data packet, the algorithm searches for the closest timestamp pair in the other queue; after a successful match, the paired data packet is encapsulated into a unified data structure, including: alignment timestamp (take the average of the two), control attribute data, screen image data, and data quality marker (used to indicate whether the alignment was successful); the aligned data packet is sent to the encoding module through a named pipe for further processing.

[0030] It should be noted that the output after time alignment is a time-aligned sequence of <control properties, screen image> data pairs. These data pairs are arranged in chronological order, with a time interval of approximately 100 milliseconds (corresponding to a 10Hz sampling frequency). Each data pair is marked with a unique timestamp to ensure that the timing relationship of user operations can be accurately restored in subsequent processing stages. like Figure 2 As shown in the figure, the distribution relationship between the interface control attribute sampling sequence and the screen image sampling sequence on the time axis is given. The red vertical lines represent screen image sampling points, and the blue dots represent control attribute sampling points. The red solid lines connect data pairs with timestamp differences within a 30-millisecond tolerance range, which are considered synchronous data. The green dashed lines connect data pairs with timestamp differences exceeding the 30-millisecond tolerance range, which cannot be considered synchronous data. The alignment process adopts the minimum timestamp difference matching principle to provide an accurate time alignment basis for the subsequent operation trajectory construction.

[0031] Training data preparation. Collect a large amount of real user operation data in a controlled Windows environment, including: Normal operation data: Operation record data from standard business processes; Abnormal operation data: Records of violations captured in simulation or real-world scenarios; Data labeling: Label each operation sequence with tags such as operation type, target control, and whether it violates regulations.

[0032] This step provides a highly consistent and reliable foundation for constructing subsequent operation trajectories by synchronously collecting structured interface control attribute data and unstructured screen image data at the operating system level and performing precise time alignment processing on the multi-source data.

[0033] In this embodiment, obtaining the operation trajectory includes: constructing control relationships based on interface control attributes, including at least spatial proximity relationships, focus transfer relationships, and operation sequence relationships between controls; performing interface element detection on the screen image through a preset visual analysis module to extract interface semantic information; and jointly encoding the control relationships and interface semantic information in chronological order to generate the operation trajectory.

[0034] Specifically, after completing the step data collection and time alignment, the timestamp is used as a unified time sequence index to construct control relationships for the interface control attribute data obtained under the same timestamp, and interface semantic information is extracted from the corresponding screen image.

[0035] Building control relationships based on UI control property data includes the following steps: The position information of each interface control in the screen coordinate system is analyzed, and the spatial proximity relationship between controls is established by calculating the Euclidean distance between the center points of any two controls. When the Euclidean distance is less than a preset threshold (100 pixels in this embodiment), it is determined that there is a spatial proximity relationship between the corresponding controls. At the same time, in order to ensure the connectivity of the control relationship graph, at least three neighboring controls with the closest Euclidean distance are retained for each control. Based on spatial proximity, focus transfer relationships between controls are constructed by observing changes in the focus state of controls over consecutive timestamps. Specifically, at each sampling time point, the control currently receiving keyboard focus is obtained through the Windows UI Automation interface as the focus control. For two adjacent time points, if the focus control changes, it is further determined whether the dwell time of the focus on the previous control exceeds a preset dwell threshold (150 milliseconds in this embodiment) to eliminate the influence of sampling noise or instantaneous focus flicker. When the dwell time condition is met, a directed edge is added to the control relationship from the previous focus control to the next focus control, and the time interval of focus transfer is recorded as the weight attribute of this edge. Simultaneously, the operation sequence relationship between controls is constructed by combining the triggering order of user input events. User input events, including mouse click events and keyboard key events, are captured by monitoring through Windows hooks or system message queues. For each input event, its event type, occurrence timestamp, and target control are recorded, where the target control is determined by screen coordinate mapping or the currently focused control. Input events within the same user operation session are sorted according to their timestamps. When two consecutive input events correspond to different target controls, a directed edge is added to the control relationship, pointing from the previous control to the next control, with the time interval between the two events used as the weight of the edge. The final constructed control relationships can be represented as a weighted directed graph, as follows: , in: Represents a collection of control nodes; Represents a set of edges, which includes three types: spatial proximity edges (undirected), focus transfer edges (directed), and operation order edges (directed). The weight function is denoted by , where the weight of a spatially adjacent edge is the reciprocal of the distance, and the weights of a focus-transfer edge and an operation sequence edge are the reciprocal of the time interval.

[0036] While constructing the control relationships, semantic information of the interface is extracted from the time-aligned screen image. Specifically, a pre-defined visual analysis module is used to detect interface elements in the screen image. This visual analysis module is implemented using a deep learning-based object detection model, such as YOLO, Faster R-CNN, or a model with equivalent functionality. The model is trained using a pre-built interface element annotation dataset to identify interface elements such as windows, buttons, text boxes, checkboxes, list boxes, and icons in the screen image, and outputs the corresponding bounding box position information and category attributes. After completing the interface element detection, text recognition processing is performed on the screen image to obtain the text content presented in the interface. Subsequently, based on the spatial relationship between the text content and the interface controls, the recognized text content is associated with the corresponding interface controls. Specifically, for each text content block, the intersection-union ratio or center point distance between its bounding box and the bounding boxes of each interface element is calculated, and the text content is associated with the interface control with the highest spatial overlap or the closest distance. When the corresponding control has already obtained the attribute text through the system interface, the text recognized by OCR is used as the visual text label of the control for supplementation or consistency verification. Further, the associated text content is fused with the position information and category attributes of the interface elements to form interface semantic information, and it is then vectorized and represented. In this embodiment, the specific implementation process is as follows: This embodiment uses the YOLOv5 model for interface element detection. The model is fine-tuned and trained on a self-built Windows GUI element dataset. This dataset contains 50,000 labeled images, covering 15 common GUI elements. Text recognition uses the Tesseract OCR engine, combined with a custom character set and language model tailored to the characteristics of screen text. Semantic information vectorization uses a pre-trained BERT model to encode the text content into a 768-dimensional vector, which is then concatenated with visual features (boundary box coordinates, category one-hot encoding) to form the final semantic feature vector.

[0037] Before jointly encoding the control relationships and interface semantic information, consistency checks are performed on information from different data sources. Specifically, interface elements identified from screen images at the same timestamp are mapped and matched with interface control attributes obtained through the system interface based on their spatial location and control type. For the same interface element that has been mapped, its visual text content and control attribute text content are obtained respectively, and the semantic similarity between the two is calculated using a natural language processing model. At the same time, the difference between the two is calculated based on the string edit distance. When the semantic similarity is lower than a preset threshold (0.85 in this embodiment) or the difference is higher than a preset distance threshold (0.2 in this embodiment), the corresponding data source is determined to be untrustworthy, and the features of the previous reliable time point are used for interpolation, or only the features of another reliable data source are used for subsequent encoding. At the same time, a high-risk alarm message is generated for verification.

[0038] After data filtering, the control relationships built based on UI control attributes are represented as a graph structure, where nodes are UI controls and edges represent spatial proximity, focus transfer, and operation order relationships. The weight of each edge is independently calculated and normalized according to its respective physical meaning (such as distance and time interval), constructing a heterogeneous graph of control relationships; The heterogeneous graph (containing various relationship types such as spatial proximity, focus shift, and operation order) is input into a two-layer graph attention network for embedding processing to obtain its structured vector representation. The input to the network is the initial feature vector of each control node in the graph, which is composed of two parts: one is the type encoding obtained by one-hot encoding or learnable embedding of the control type (such as button, text box); the other is the position encoding obtained by linear transformation or sine position encoding of the normalized coordinates of the control on the screen (such as center point x, y). The core of the network is a multi-head attention mechanism: In the first layer, each node aggregates the features of its first-order neighbor nodes through attention weights, initially fusing local relational information; in the second layer, the node further aggregates the features of its neighbors (which now contain information propagated by first-order neighbors), thereby capturing richer contextual dependencies within a two-hop range in the graph; after two layers of information propagation and transformation, each node obtains an updated feature representation containing its own attributes and local graph structure information; finally, by performing average pooling on the updated feature vectors of all nodes in the graph at the current timestamp, they are compressed and aggregated into a single, fixed-dimensional global graph feature vector; this vector (set to 128 dimensions in this embodiment) compactly represents the control relationship structure and interaction state of the entire graphical user interface at the current moment, providing a foundation for subsequent fusion with visual semantic features; Subsequently, the control relationship feature vectors obtained at the same timestamp are fused with the interface semantic information vectors. The feature fusion method includes vector concatenation or weighted summation to obtain the feature representation corresponding to that time point. Furthermore, according to the chronological order of the timestamps, the feature representations corresponding to each time point are sequentially input into the temporal encoder for joint modeling. The temporal encoder uses a state update mechanism to correlate and model the features of the current time point with those of historical time points, thereby simultaneously retaining the temporal sequence information of user operations, the information on changes in control relationships, and the information on the semantic evolution of the interface in the encoding result, and finally outputting the operation trajectory that changes over time. The operation trajectory is represented in the form of a time series vector, providing a unified and structured input for the continuous temporal dynamics evolution of the subsequent liquid neural network. The timing encoder can employ a lightweight temporal convolutional network, a recurrent neural network, a gated recurrent unit, a long short-term memory network, or a timing modeling structure with equivalent functionality. In this embodiment, a bidirectional GRU network with a hidden layer dimension of 256 is used, which can effectively capture the contextual dependencies in the operation sequence.

[0039] In different embodiments of the present invention, visual analysis, text recognition and semantic encoding can employ any image processing algorithm, text recognition algorithm or semantic representation model capable of achieving the corresponding functions. The algorithm names mentioned above are for illustrative purposes only.

[0040] S2 utilizes a liquid neural network to model the operation trajectory in continuous time.

[0041] The continuous-time modeling based on the operation trajectory is achieved through a continuous-time model with time-scale adaptability.

[0042] This embodiment employs a continuous-time neural network model with time-scale adaptability as the continuous-time model, such as a Liquid Time-Constant Network (LTC). It should be understood that the continuous-time model described in this invention is not limited to the example model above, and other continuous-time modeling structures based on ordinary differential equations, delay differential equations, or state-space forms can also be used.

[0043] The liquid neural network is driven by ordinary differential equations and is used to perform continuous-time dynamics modeling of the operation trajectory. Based on its fundamental time constant, it performs continuous-time dynamic evolution of the hidden states and outputs structured operation record data, specifically including: The liquid neural network is a neural network model with a continuous-time hidden state update mechanism. The change of its hidden state over time is described by ordinary differential equations (ODEs), thereby achieving dynamic modeling of continuous-time user operation behavior. The evolution of the hidden state of each liquid neuron in the liquid neural network satisfies the following continuous-time dynamic equation: , in, For the first A liquid neuron in time The hidden state vector; The hidden state vector consists of the hidden states of all liquid neurons; The input operation trajectory; The base time constant is a positive adjustable parameter. In this embodiment... ; The activation function includes, but is not limited to, the hyperbolic tangent function, the sigmoid function, or a nonlinear function that is functionally equivalent to it. , These are the hidden state weight matrix and the input weight matrix, respectively. It is the bias vector; For continuous time variables; The above mathematical expression is only used to illustrate one implementation of the continuous-time model. Its specific parameter form, neuron size, numerical solution method and time step can be adjusted or replaced according to actual application needs, and does not constitute a limitation on the technical solution of this invention.

[0044] During the model training phase, the system uses a large amount of labeled user operation sequence data for supervised learning of the weight matrix and bias parameters of the liquid neural network. The training objective is to minimize the difference between the structured operation record data output by the network (including predicted operation types, operation object identifiers, etc.) and the real records. To achieve this, a differentiable loss function needs to be defined; for example, cross-entropy loss can be used for operation type classification tasks, and classification or regression loss can be used for operation object identifiers. Since the hidden state evolution of a liquid neural network is described by an ordinary differential equation (ODE), the training process requires the use of a time-based backpropagation algorithm. The gradient of the loss function with respect to all network parameters is calculated using the adjoint sensitivity method or automatic differentiation technique of the ODE solver. Subsequently, the weights and biases are iteratively updated using gradient descent optimization algorithms (such as Adam). This process enables the network to learn to extract key temporal patterns from continuous operation trajectories and accurately map them to standardized operation record data, thereby significantly improving the modeling accuracy of the temporal dependencies of complex user interaction behaviors.

[0045] The operation trajectory is input as a continuous-time input signal into the liquid neural network, and the decay rate and response sensitivity of the hidden state to historical inputs are controlled according to the basic time constant.

[0046] It should be noted that the basic time constant is a key adjustable hyperparameter, which physically characterizes the time scale of network state decay or update. The larger its value, the longer the network remembers historical information and the smoother its response to changes in the current input. In actual operation, the above ordinary differential equations are solved by numerical methods (such as the Euler method or the fourth-order Runge-Kutta method) to obtain the evolution results of the hidden state in continuous time. For each time point, the corresponding hidden state vector is input as a high-dimensional feature representation into the fully connected layer or decoder to generate structured operation record data.

[0047] Specifically, in one embodiment, the liquid neural network adopts an architecture based on Liquid Time-Constant Network (LTC) and contains 256 liquid neurons; the ODE is solved using the fourth-order Runge-Kutta method with a fixed step size of 10ms (consistent with the data sampling interval); the input layer dimension is 512 (matching the operation trajectory dimension), and the output layer dimension is 128 (hidden state dimension); the network parameters are trained using the backpropagation time time (BPTT) algorithm.

[0048] In different embodiments of the present invention, the number of neurons, time step, numerical integration method, and training parameters of the continuous-time model can be set according to computing resources, real-time requirements, and application scenarios. For example, numerical integration algorithms of different orders or network structures of different sizes can be used. The above parameter selection is only an example and does not constitute a limitation of the present invention.

[0049] Specifically, the fully connected layer or decoder receives the hidden state vector of the liquid neural network at the current time point and performs linear transformation and nonlinear mapping on it; the linear transformation is implemented through the weight matrix and bias vector obtained through pre-training, mapping the hidden state vector to a predefined output feature space; the nonlinear mapping is used to enhance the model's ability to distinguish different operation modes; In this embodiment, the output feature space is divided according to the fields of the structured operation record data, including at least a first output subspace for representing operation type, a second output subspace for representing operation object identifier, and a third output subspace for representing operation time information. Specifically, for the operation type field, the Softmax function is used to normalize the corresponding output subspace to obtain the probability distribution of each candidate operation type, and the operation type with the highest probability value is selected as the operation type prediction result for the current time point. For the operation object identifier field, classification or regression methods are used to output the identifier information of the corresponding control or functional object. For the timestamp field, the system timestamp corresponding to the current hidden state is used directly, or the operation occurrence time is corrected and output through regression. The weight parameters in the fully connected layer or decoder are also learned through gradient descent optimization algorithms during the model training phase, so as to minimize the difference between the output structured operation record data and the real user operation record data, thereby achieving an accurate structured representation of user operation behavior. In this way, the continuous-time hidden state of the liquid neural network is converted into structured operation record data containing operation type, operation object identifier and timestamp fields on a time-by-time basis, providing standardized input for subsequent risk assessment and violation determination.

[0050] This step overcomes the shortcomings of traditional recurrent neural networks based on discrete time steps in modeling fine and continuous user interaction temporal behavior by introducing a continuous-time dynamic evolution model driven by ordinary differential equations. This enables the network to smoothly and dynamically model the complex temporal dependencies in the user operation trajectory in a way that simulates a continuous-time system.

[0051] S3, based on the output of continuous-time modeling, analyzes the operational anomaly degree to obtain the behavior modulation parameter, and uses this parameter to modulate the dynamic parameters of continuous-time modeling.

[0052] In one embodiment, key operation feature vectors are extracted from the output of continuous-time modeling, and the similarity between these feature vectors and the feature vectors of each pattern in a predefined compliant operation pattern library is calculated. Operation confidence is then generated based on this similarity. Specifically: The output of the continuous-time modeling includes structured operation record data; The similarity calculation formula is as follows (cosine similarity is used in this embodiment): , in, The cosine similarity between the key operation feature vector and the feature vector of a certain pattern in the predefined compliant operation pattern library. These are the key operational feature vectors. The feature vector of a certain pattern in a predefined library of compliant operation patterns; Operational confidence is obtained based on the highest similarity or a weighted average, including: Operational confidence is calculated based on the highest similarity. The formula is as follows: , Operational confidence level is calculated based on weighted average. The formula is as follows: , in, To represent the predefined compliance operation mode library The weight coefficients of each compliance mode feature vector. This is the index for the feature vector of the compliance mode.

[0053] It should be noted that, in one embodiment, the "predefined compliance operation mode library" is constructed in the following way: During the initial deployment or training phase of the system, a large amount of known and compliant user operation record data is collected. Each piece of operation data is processed to extract key features, such as operation type, operation object, and execution time interval, to form a structured record sequence; Unsupervised clustering algorithms (such as DBSCAN) are used to cluster the above features, grouping records with similar operation patterns into the same category; Each cluster center represents a candidate compliant operation pattern. These candidate patterns are reviewed and labeled by domain experts based on security policies or through automated rule scripts (e.g., patterns that only include read / write operations on trusted path files). The feature vectors of the patterns that are confirmed to be compliant are stored in the compliant operation pattern library.

[0054] In other embodiments, the compliance operation pattern library can also be built through rule engines, historical behavior statistical analysis, reinforcement learning or other machine learning methods, which can be used individually or in combination.

[0055] Furthermore, based on the mapping relationship between interface controls and system functional components, the functional components directly invoked by the operation are determined, and combined with the dependencies between functional components, the set of functional components affected by the operation is determined. This embodiment achieves the mapping through a combination of the following two methods: Static mapping table: For known common application software (such as Office suite, browser), a mapping table is pre-established between its key interface controls (identified by control ID, window class name, text content, etc.) and background functional components (such as "file saving", "network request sending", "registry modification"). Dynamic API hooks: At the system level, key Windows APIs (such as CreateFileW, RegSetValueEx, send, etc.) are hooked through API monitoring technology (such as using the Microsoft Detours library). When user operations trigger these API calls, the call stack is analyzed and combined with the information of the currently active window and focused control to dynamically establish the association between this operation (control) and the underlying functional components (the API being called and its module). Dependencies between functional components are obtained in advance or dynamically through system architecture documents, dynamic link library (DLL) import / export table analysis, or runtime call graphs. The dependency graph includes static or dynamic dependencies between various functional components in the system. These dependencies include, but are not limited to: component A calling services provided by component B, component A reading or modifying data stored by component B, and a state change in component A triggering the execution flow of component B. The dependencies between functional components can be obtained through one or more of the following methods: (a) A predefined system resource access rule base (e.g., process A frequently accesses registry key B); (b) Dynamically analyze the call relationship graph generated by application behavior in a controlled environment; (c) Process-resource dependency information provided by the operating system's resource monitoring interface (such as Windows' Restart Manager API).

[0056] Based on the above mapping, the list of functional components directly called by the current operation can be determined, and a graph traversal analysis can be performed according to the dependency graph to recursively find all affected upstream and downstream components, forming a "set of affected functional components". The traversal method is depth-first search or breadth-first search: starting from the starting node, all reachable nodes are recursively or iteratively visited along the direction of the dependency edges. All nodes (i.e. functional components) collected during the traversal process constitute a set of affected functional components. This set not only includes the directly called components, but also all downstream or related components that are potentially affected indirectly due to their dependencies.

[0057] It should be understood that the above-mentioned method of obtaining the set of functional components is only an example, and its purpose is to determine the scope of functional modules related to the current operation. This invention does not limit the specific means of obtaining them.

[0058] The set of functional components Operational confidence The behavior modulation parameters with continuous values ​​are generated using the Sigmoid function or a weighted summation function, as shown in the following formula: Behavioral modulation parameters are calculated using the Sigmoid function. The specific formula is as follows: , Behavior modulation parameters are calculated using a weighted summation function. The specific formula is as follows: , in, The cardinality represents the set of normalized functional components, that is, the number of normalized functional components contained in the set. The weighting coefficients for the operational confidence level. In this embodiment ; The slope parameter (or adjustment coefficient) in the sigmoid function controls the steepness of the function curve. In this embodiment... ; In this embodiment, a weighting coefficient is used to adjust the influence of the number of functional components. ; It should be noted that the "functional component set" is normalized to include: (1) Determine the maximum impact scale. During the system deployment or training phase, a "maximum expected number of impact components" is preset by analyzing historical logs, system architecture or security policies; for example, based on the analysis of typical violations (such as large-scale file encryption, registry traversal modification), the upper limit of the number of functional components that a single operation may trigger is determined, such as setting it to 50; (2) Real-time statistics and truncation. When a user performs an operation, the system uses static mapping and dynamic hook technology to track and count the number of functional components directly and indirectly affected by the operation in real time, and obtain the original count value N; (3) Calculate the normalized value. Divide the original quantity N obtained from the statistics by the preset "maximum impact size"; if the result is greater than 1 (that is, the actual impact exceeds the expected maximum range), then force it to be set to 1.0, indicating that the risk has reached the theoretical maximum value; finally, output a normalized value between 0 and 1, which directly represents the relative severity of the current operation's impact range; the larger this value is, the wider the system range affected by the operation, and the higher the potential risk.

[0059] In one embodiment, the fundamental time constant of the dynamically modulated liquid neural network is specifically set as follows: the adjusted fundamental time constant is set to... ,in, The adjusted base time constant is constrained within a preset positive range; This is the initial fundamental time constant; The behavior modulation parameters; In this embodiment, the adjustment coefficient is greater than zero. .like Figure 3 As shown in the figure, the behavior modulation parameter changes over time and its dynamic adjustment effect on the base time constant is illustrated. When the behavior modulation parameter increases, the base time constant decreases accordingly, thereby changing the network's attenuation scale of historical information and improving its responsiveness to risk situations.

[0060] It should be noted that when the behavior modulation parameter is large (close to 1), it indicates that the current operation is risky. At this time, the basic time constant is significantly reduced, the memory time of the liquid neural network for historical information is shortened, and the response to the current input is more sensitive, which is conducive to the rapid detection of abnormal patterns. When the behavior modulation parameter is small (close to 0), it indicates that the current operation risk is low. At this time, the basic time constant is close to the initial value, and the network maintains a long memory time, which is beneficial for recognizing operation patterns that require long-term contextual understanding. The operation confidence threshold is set to 0.7. When the operation confidence is ≥0.7, the risk assessment and parameter adjustment are skipped, and the basic time constant is kept unchanged to avoid unnecessary interference with normal operation. In one embodiment, a stability index, such as hidden state variance, state change rate, or system spectrum radius, can be calculated based on the changes in the hidden state trajectory of the continuous-time model before and after modulation. When the stability index exceeds a preset threshold, the magnitude or rate of change of the behavioral modulation parameters is limited.

[0061] S4. Based on the modulated continuous-time modeling results, the evolutionary features are extracted, and the operation is determined as a violation based on the evolutionary features.

[0062] In this embodiment, the operation trajectory of the next time step is input into the liquid neural network that has completed the time constant adjustment, and the continuous time dynamic evolution process is repeatedly executed. The change amplitude, change rate and deviation from the baseline features established by the historical compliant operation period are extracted from the hidden state sequence as evolution features. The specific formula for calculating the magnitude of change is as follows: , The specific formula for calculating the rate of change is as follows: , The specific formula for calculating the deviation from historical compliance operation benchmarks is as follows: , in, For the first The magnitude of state changes of each neuron The start time of the time window. The end time of the time window. This is a vector composed of the magnitudes of changes in all neurons. For the first A neuron in time The instantaneous rate of change, For the first The average rate of change of each neuron The total number of time steps within the time window. This is a vector composed of the average rate of change of all neurons. This represents the total number of neurons in the hidden layer. The first feature vector in the compliance benchmark A neuron in time state, For the first A neuron in time The point state deviation, For the first The average deviation of each neuron The first feature vector in the compliance benchmark The standard deviation of the state of each neuron For the normalized first The relative deviation of each neuron; The three features mentioned above are concatenated to form the final evolutionary feature vector: ,in, The normalized evolutionary feature vector is used. In this embodiment, the specific implementation process is as follows: maintain a sliding window buffer to store the hidden states of the most recent W time steps (W=10 in this embodiment, corresponding to a 1-second operation window); for each new time step, calculate the statistical features of the state within the window; compliance benchmark features are obtained by offline analysis of a large amount of normal operation data and stored as the mean and standard deviation time series of each neuron in a typical operation mode.

[0063] In this embodiment, if the cosine similarity between the evolved feature vector and the feature vectors of each mode in the compliant operation mode library is lower than a preset similarity threshold, the corresponding user operation is determined to be a violation, and the corresponding recognition result is output. In this embodiment, the preset similarity threshold is determined by adjustment on the validation set to achieve a balance between the false positive rate and the false negative rate; in this embodiment, the threshold is set to 0.7.

[0064] Specifically, the cosine similarity between the evolutionary feature vector and the feature vectors of each pattern in the compliance operation pattern library. The calculation formula is as follows: ; The decision logic is as follows: If the confidence level of the operation and If so, it is determined to be a compliant operation; If the confidence level of the operation ,and If so, it is judged as a high-risk violation and a high-risk alarm is generated; If the confidence level of the operation and If so, it is determined to be a potential violation and a general alarm is generated; Other situations are deemed suspicious and require further manual review.

[0065] like Figure 4 As shown in the figure, the cosine similarity sequence between the evolution feature vector and the compliant operation mode library under each time window is given, and the threshold line represents the preset similarity threshold; when the similarity is lower than the threshold line, the corresponding operation is determined to be a violation operation and the recognition result is output.

[0066] Example 2, Figure 5 A graphical user interface (GUI) user behavior detection system is presented, comprising: The data acquisition module is used to collect the control properties and screen image data of the user in the graphical interface, perform time alignment and feature fusion, and generate operation trajectory. The continuous-time modeling module is used to perform continuous-time modeling of the operation trajectory using a liquid neural network; The anomaly analysis and modulation module is used to analyze the operational anomaly degree based on the output of continuous-time modeling to obtain behavioral modulation parameters, and to use these parameters to modulate the dynamic parameters of continuous-time modeling. The feature extraction and judgment module is used to extract evolutionary features based on the modulated continuous-time modeling results, and to determine whether the operation is illegal based on the evolutionary features.

[0067] The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.

[0068] The above embodiments can be implemented, in whole or in part, by software, hardware, firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, in the form of a computer program product.

[0069] Those skilled in the art will recognize that the modules and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0070] In addition, the functional modules in the various embodiments of this application can be integrated into one processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.

[0071] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0072] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for detecting graphical user interface operation behavior, characterized in that, include: Collect user control properties and screen image data in the graphical interface, perform time alignment and feature fusion, and generate operation trajectory; Continuous-time modeling of the operation trajectory is performed using a liquid neural network; Based on the output of continuous-time modeling, the operational anomaly degree is analyzed to obtain the behavior modulation parameter, and this parameter is used to modulate the dynamic parameters of continuous-time modeling. Evolutionary features are extracted based on the modulated continuous-time modeling results, and the operation is determined as a violation based on the evolutionary features.

2. The graphical user interface operation behavior detection method according to claim 1, characterized in that, The continuous-time modeling of the operation trajectory is achieved through a continuous-time model with time-scale adaptability.

3. The graphical user interface operation behavior detection method according to claim 2, characterized in that, The liquid neural network adaptively captures multi-scale temporal patterns in the operation trajectory through the dynamic characteristics of its internal state changing continuously over time.

4. The graphical user interface operation behavior detection method according to claim 1, characterized in that, The output based on continuous-time modeling is used to analyze operational anomalies to obtain behavioral modulation parameters, including: Extract the operation features from the operation record data output by continuous-time modeling to generate operation feature vectors; Calculate the matching degree between the operation feature vector and the compliant operation pattern library to obtain the operation confidence score; Based on the mapping relationship between interface controls and system functional components, determine the set of functional components affected by the current operation; Based on the set of functional components and operational confidence, time-continuous behavioral modulation parameters are obtained through nonlinear mapping.

5. The graphical user interface operation behavior detection method according to claim 1, characterized in that, The modulation of dynamic parameters modeled in continuous time using this parameter includes: The underlying time constant of the liquid neural network is dynamically modulated based on the behavior modulation parameters to obtain an adjusted underlying time constant, thereby changing the continuous time response scale of the liquid neural network. Based on the adjusted base time constant, the continuous-time state update process of the liquid neural network is dynamically adjusted.

6. The graphical user interface operation behavior detection method according to claim 1, characterized in that, The generation of the operation trajectory includes: Build control relationships based on UI control properties; Get the position information and category attributes of UI elements; Perform text recognition on the screen image to obtain the text content; By associating text content with interface elements based on spatial inclusion relationships, semantic information of the interface is formed. The relationships between controls are integrated with the semantic information of the interface, and then encoded using a timing encoder to generate an operation trajectory.

7. The graphical user interface operation behavior detection method according to claim 1, characterized in that, Before performing time alignment and feature fusion to generate the operation trajectory, the following steps are also included: Associate and map the interface elements identified in the image with the interface controls; For the mapped interface elements, compare the text content of the image recognition with the corresponding attribute text content obtained from the system interface; When the two are found to be inconsistent, the corresponding control property data source is determined to be untrustworthy and an alarm is generated. In subsequent feature fusion, historical reliable features are used for interpolation.

8. The graphical user interface operation behavior detection method according to claim 1, characterized in that, Also includes: The system stability index is calculated based on the hidden state trajectories before and after modulation using a continuous-time model. When the stability index changes abnormally, the magnitude or rate of change of the behavior modulation parameters is constrained to limit the state evolution of the liquid neural network within a preset stability domain.

9. The graphical user interface operation behavior detection method according to claim 1, characterized in that, The evolutionary features include at least the magnitude of change of the hidden state vector, the rate of change, or the degree of deviation of it from the baseline features established during historical compliant operation periods.

10. A graphical user interface operation behavior detection system, characterized in that, The system for implementing the method as described in any one of claims 1 to 9 comprises: The data acquisition module is used to collect the control properties and screen image data of the user in the graphical interface, perform time alignment and feature fusion, and generate operation trajectory. The continuous-time modeling module is used to perform continuous-time modeling of the operation trajectory using a liquid neural network; The anomaly analysis and modulation module is used to analyze the operational anomaly degree based on the output of continuous-time modeling to obtain behavioral modulation parameters, and to use these parameters to modulate the dynamic parameters of continuous-time modeling. The feature extraction and judgment module is used to extract evolutionary features based on the modulated continuous-time modeling results, and to determine whether the operation is illegal based on the evolutionary features.