Internal user anomaly and data leakage monitoring method based on AI behavior analysis
By introducing terminal environment parameters into behavior analysis, decoupling the processing of operational behavior data, constructing a standardized feature set, and calculating the proportion of behavior drift, the accuracy problem of monitoring user anomalies and data leakage in multi-terminal and multi-network environments is solved, achieving stable and accurate monitoring in complex environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN SYNTEK NETWORK SECURITY CO LTD
- Filing Date
- 2026-01-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing AI-based behavioral analysis methods for monitoring internal user anomalies and data breaches struggle to accurately distinguish between normal behavioral fluctuations caused by environmental changes and risky behaviors driven by abnormal intentions in multi-terminal and multi-network environments, leading to increased false alarms or missed anomalies.
By acquiring operational behavior data of internal users under different terminal forms and network environments, environmental conditions are decoupled, a standardized behavioral feature set is constructed, and behavioral structural features that represent changes in user operation rhythm, fluctuations in command granularity, and evolution of access paths are extracted. The proportion of environment-induced behavioral drift is calculated, and combined with data access depth, access frequency, and cross-system correlation, the risk index of abnormal behavior of internal users is calculated.
Accurately identify the source of behavioral fluctuations in multi-terminal and multi-network environments, reduce the probability of false alarms caused by environmental changes, improve the ability to identify covert and progressive data leakage behaviors, and achieve stable and accurate monitoring of abnormal behavior and data leakage risks.
Smart Images

Figure CN122241728A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data breach monitoring technology, and more specifically, to a method for monitoring internal user anomalies and data breaches based on AI behavioral analysis. Background Technology
[0002] In the actual operation environment of enterprise information systems, internal users' operations are often not completed under a single, stable technical condition, but frequently occur across different network environments such as the company intranet, remote office networks, and VPNs, accompanied by the alternating use of various terminal forms such as desktops, laptops, and virtual desktops. Simultaneously, with the widespread adoption of automated operation and maintenance and scripted operations, human interaction and automated program execution under the same user identity occur intermittently in a time sequence. This switching between multiple terminals and environments results in significant differences in user operation rhythm, command trigger intervals, command granularity, and data access paths. These differences do not stem from changes in user business intent, but are introduced by objective conditions such as differences in terminal physical performance, network latency fluctuations, and changes in input methods. Existing AI-based behavioral analysis methods for monitoring internal user anomalies and data breaches typically assume that user behavior characteristics have a relatively stable statistical structure in the embedding space. When non-business factors continuously disturb the behavioral characteristics, the original behavioral baseline and anomaly discrimination boundary will undergo structural shifts, making it difficult for the model to accurately distinguish between normal behavioral fluctuations caused by environmental changes and real risk behaviors driven by abnormal intentions. Ultimately, this leads to an increase in false alarms or missed anomalies in complex office environments. Summary of the Invention
[0003] To overcome the aforementioned deficiencies of the prior art, embodiments of the present invention provide an internal user anomaly and data leakage monitoring method based on AI behavior analysis, in order to solve the problems mentioned in the background art.
[0004] To achieve the above objectives, the present invention provides the following technical solution: The method for monitoring internal user anomalies and data breaches based on AI behavioral analysis includes the following steps: Acquire operational behavior data of internal users under different terminal forms and network environments, as well as the corresponding terminal environment parameters; Based on the terminal environment parameters, the operation behavior data is decoupled from the environmental conditions to construct a standardized behavior feature set; Based on the standardized behavioral feature set, behavioral structural features representing changes in user operation rhythm, fluctuations in command granularity, and evolution of access paths are extracted to construct a behavioral drift description set. Based on the behavior drift description set, the behavioral structure shift component caused by changes in the terminal environment and the behavioral shift component caused by changes in user operation intention are distinguished, and the proportion of environment-induced behavior drift is calculated. Based on the proportion of environmentally induced behavior drift and the baseline of user's historical behavior, calculate the user's behavioral intent offset after removing the environmental influence; Based on the behavioral intent offset, and combined with data access depth, access frequency and cross-system correlation, the internal user abnormal behavior risk index is calculated, and a data leakage risk assessment result is generated.
[0005] In a preferred embodiment, the process of acquiring operational behavior data of internal users under different terminal forms and network environments, as well as the corresponding terminal environment parameters, is as follows: Deploy behavior audit collection modules and environmental status awareness modules at the access nodes of the enterprise's internal business systems, data systems, and operation and maintenance systems to continuously collect user operation behavior data and their corresponding terminal environment parameters during normal user business operations and terminal environment switching. Set a uniform behavior sampling granularity for the collected operational behavior data and terminal environment parameters, and attach a precise timestamp and environment identifier to each operational behavior; When terminal switching, network state change and operation mode change events are detected, the corresponding time is marked as the environment switching trigger point. Continuous observation windows are set before and after the trigger point to jointly sample user operation behavior and environmental parameters to obtain a complete behavior evolution sequence covering the stable stage before environment switching, the switching transition stage and the adaptation stage after switching. The behavior evolution sequence is jointly mapped with the terminal environment parameters at the corresponding time to construct a behavior-environment related sample; Based on the behavioral environment associated samples, the timing of the operation behavior is processed by a sliding window to extract behavioral representation parameters such as the operation time interval distribution, instruction call density and access path length, forming basic behavioral feature components. Multiple behavioral feature components obtained under the same terminal environment are combined to construct a corresponding subset of operational behavioral features, which together with the subsets of behavioral features formed under different terminal forms and network environment conditions constitute an operational behavioral dataset under multiple terminal and multiple environment conditions.
[0006] In a preferred embodiment, the process of decoupling the operational behavior data from environmental conditions based on the terminal environment parameters and constructing a standardized behavior feature set is as follows: Based on the acquired behavioral environment-related samples, the terminal type, network connection method, and network latency status are discretized to form an environmental condition description vector; Based on the environmental condition description vector, the operation behavior sequence is divided into several environmentally consistent behavior segments; For consistent behavior segments in various environments, the basic behavioral characteristics of operation time interval, instruction call density and access path length are statistically analyzed, and network latency and terminal performance factors are introduced to perform environmental compensation correction on the basic behavioral characteristics. After completing the environmental compensation correction, the behavioral feature components are normalized to map the behavioral features of different dimensions to a unified numerical range, thus obtaining a standardized behavioral feature vector. The standardized behavioral feature vectors obtained from different behavioral segments are concatenated in chronological order to construct a standardized behavioral feature set that spans multiple terminals and network environments.
[0007] In a preferred embodiment, the process of extracting behavioral structural features representing changes in user operation rhythm, command granularity fluctuations, and access path evolution based on the standardized behavioral feature set, and constructing a behavioral drift description set, is as follows: Based on the standardized behavioral feature set, user operation behaviors are continuously divided in chronological order to form multiple interconnected behavioral analysis windows; Within each behavior analysis window, statistical analysis is performed on the standardized operation time interval sequence to extract operation rhythm fluctuation characteristics, including the degree of operation rhythm fluctuation. For the command granularity variation characteristics, the distribution of different types of instructions and interface calls is statistically analyzed in each behavior analysis window, and the command granularity ratio of high-granularity operations and low-granularity operations is calculated. The user data access path is serialized and compared between behavior analysis windows to extract access path evolution features, including access path evolution distance. The characteristics of operation rhythm fluctuation, command granularity change, and access path evolution are combined to construct the behavior structure feature vector of the corresponding behavior analysis window; By performing temporal fusion on the behavioral structure feature vectors in the continuous behavior analysis window, a behavior drift description set is obtained.
[0008] In a preferred embodiment, the process of distinguishing between behavioral structure shift components caused by changes in the terminal environment and behavioral shift components caused by changes in user operation intent, based on the behavioral drift description set, and calculating the proportion of environment-induced behavioral drift is as follows: Based on the behavior drift description set, the behavior structure feature vectors between adjacent windows are differentially processed according to the time order of the behavior analysis windows to obtain the total behavior drift vector. The total behavior drift vector is jointly analyzed with the changes in terminal environment parameters within the corresponding window to construct a behavior response reference vector driven by environmental changes. By calculating the projection component of the total behavior drift vector onto the direction of the behavior response reference vector driven by environmental change, the environment-induced behavior drift component is extracted. The proportion of environmentally induced behavioral drift is calculated based on the magnitude relationship between the environmentally induced behavioral drift component and the total behavioral drift vector. By performing time-series fusion of the proportion of environmentally induced behavior drift within the continuous behavior analysis window, a sequence of environmentally induced behavior drift proportions is formed.
[0009] In a preferred embodiment, the process of calculating the user behavior intent offset after removing environmental influences, based on the proportion of environmentally induced behavior drift and the user's historical behavior baseline, is as follows: Based on the environmental-induced behavior drift ratio sequence, the total behavior drift vector within the user's current behavior analysis window is subjected to environmental impact reduction processing to obtain a net behavior change vector that has been preliminarily removed from environmental disturbances. After obtaining the net behavior change vector, it is aligned and compared with the user's historical behavior baseline. By calculating the degree of deviation of the net behavior change vector from the historical behavior baseline, the user's behavioral intent offset is extracted. The user behavior intent offset obtained from the continuous behavior analysis window is smoothed to form a user behavior intent offset sequence.
[0010] In a preferred embodiment, the process of calculating the internal user abnormal behavior risk index and generating a data leakage risk assessment result based on the behavioral intent offset, combined with data access depth, access frequency, and cross-system correlation, is as follows: Based on the user behavior intent offset sequence, data access behavior features corresponding to the current behavior analysis window are obtained, and the data access behavior features are structured to form access frequency feature, access depth feature and cross-system switching frequency feature respectively. The user behavior intent offset is jointly modeled with the access frequency feature, access depth feature and cross-system switching frequency feature to construct an internal user abnormal behavior risk assessment function and calculate the internal user abnormal behavior risk index. Based on the comparison between the internal user abnormal behavior risk index and the preset internal user abnormal behavior risk index threshold, a data leakage risk assessment result is generated: When the internal user abnormal behavior risk index is greater than or equal to the internal user abnormal behavior risk index threshold, the current internal user behavior is determined to be in a high-risk abnormal state, and a result is generated indicating that the internal user has a high risk of data leakage. When the risk index of abnormal internal user behavior is less than the threshold of the risk index of abnormal internal user behavior, the current internal user behavior is determined to be in a low-risk state, and a low-risk data leakage assessment result is generated.
[0011] The technical effects and advantages of this invention are as follows: 1. This invention introduces terminal environment parameters during the behavior modeling stage and decouples operational behavior data from environmental conditions. This effectively separates non-business factors such as terminal performance differences, network latency fluctuations, and input method changes from user behavior characteristics. Under actual operating conditions with frequent switching between multiple terminals and network environments, user behavior characteristics can be mapped to a unified and comparable standardized feature space, thus avoiding overall behavior baseline drift caused by changes in objective technical conditions. By extracting behavioral structural features reflecting the evolution of operation rhythm, command granularity, and access paths, and constructing a behavior drift description set, this invention quantitatively distinguishes between environment-induced structural shifts and changes in actual user operation intentions, enabling accurate... It accurately identifies which behavioral fluctuations originate from changes in terminal and network conditions, and which behavioral deviations stem from potential abnormal or risky intentions. By combining the user's historical behavior baseline, it calculates the behavioral intention deviation after removing environmental influences, and integrates this deviation with risk-sensitive indicators such as data access depth, access frequency, and cross-system correlation to form an internal user abnormal behavior risk index with a clear causal explanation path. This significantly reduces the probability of false alarms caused by environmental changes in complex office environments and scenarios with widespread automated operations, while improving the ability to identify covert and gradual data leakage behaviors, achieving stable, accurate, and explainable monitoring of internal user abnormal behavior and data leakage risks. Attached Figure Description
[0012] To facilitate understanding by those skilled in the art, the present invention will be further described below with reference to the accompanying drawings; Figure 1 This is a flowchart of a method according to an embodiment of the present invention. Detailed Implementation
[0013] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0014] Example: Figure 1 This invention presents a method for monitoring internal user anomalies and data leaks based on AI behavior analysis, comprising the following steps: The system acquires operational behavior data generated by internal users under different terminal forms and network environments, as well as corresponding terminal environment parameters. The operational behavior data includes at least the operation time interval, instruction call characteristics, data access path, and data access object information. The terminal environment parameters include at least the terminal type, network status, and input method identifier. Based on the terminal environment parameters, the operation behavior data is decoupled from the environmental conditions to construct a standardized behavior feature set that eliminates the impact of terminal performance differences and network latency. Based on the standardized behavioral feature set, behavioral structural features representing changes in user operation rhythm, fluctuations in command granularity, and evolution of access paths are extracted to construct a behavioral drift description set under multiple terminal and multi-environment conditions. Based on the behavior drift description set, the behavioral structure shift component caused by changes in the terminal environment and the behavioral shift component caused by changes in user operation intention are distinguished, and the proportion of environment-induced behavior drift is calculated. Based on the proportion of environmentally induced behavior drift and the baseline of user's historical behavior, calculate the user's behavioral intent offset after removing the environmental influence; Based on the behavioral intent offset, and combined with data access depth, access frequency and cross-system correlation, an internal user abnormal behavior risk index is calculated, and a data leakage risk assessment result is generated, so as to achieve accurate monitoring of internal user abnormal and data leakage behaviors in multi-terminal and multi-environment switching scenarios.
[0015] In this embodiment of the invention, the process of acquiring operational behavior data generated by internal users under different terminal forms and network environments, as well as corresponding terminal environment parameters, wherein the operational behavior data includes at least operational time intervals, command call characteristics, data access paths, and data access object information, and the terminal environment parameters include at least terminal type, network status, and input method identifiers, is as follows: Deploy behavior audit collection modules and environmental status awareness modules at the access nodes of the enterprise's internal business systems, data systems, and operation and maintenance systems to continuously collect user operation behavior data and their corresponding terminal environment parameters during normal user business operations and terminal environment switching. The operation behavior data includes the operation trigger time, the time interval between adjacent operations, the instruction and interface call type, the data access path level, and the data access object identifier. The terminal environment parameters include terminal type, network connection method, network latency status, and input method identifier; each acquisition module aligns with the unified time synchronization service to ensure consistency of different behavioral events and environmental states in the time dimension. A unified behavior sampling granularity is set for the collected operation behavior data and terminal environment parameters, and a precise timestamp and environment identifier are added to each operation behavior, so that the user's operation process under different terminal and network conditions forms a continuous and traceable behavior-environment joint time sequence on the time axis; Preferably, the behavior sampling granularity is dynamically adjusted according to the user's typical operation rhythm and the system log generation frequency to ensure that changes in operation rhythm caused by network fluctuations and differences in terminal performance can be captured; When terminal switching, network state change and operation mode change events are detected, the corresponding time is marked as the environment switching trigger point. Continuous observation windows are set before and after the trigger point to jointly sample user operation behavior and environmental parameters, so as to obtain a complete behavior evolution sequence covering the stable stage before environment switching, the switching transition stage and the adaptation stage after switching. The behavior evolution sequence is jointly mapped with the terminal environment parameters at the corresponding time to construct behavior environment association samples, so that each operation behavior corresponds to a clear terminal form and network state, which is used to characterize the coupling relationship between user operation characteristics and external environmental conditions. Based on the behavioral environment associated samples, the timing of the operation behavior is processed by a sliding window to extract behavioral representation parameters such as the distribution of operation time intervals, instruction call density and access path length, forming basic behavioral feature components to describe the user's operation rhythm and path structure. Multiple behavioral feature components obtained under the same terminal environment are combined to construct a corresponding subset of operational behavioral features. This subset, together with the subsets of behavioral features formed under different terminal forms and network environments, constitutes a dataset of operational behaviors under multiple terminal and multiple environment conditions, providing basic data support for subsequent environmental impact decoupling and behavioral intent analysis. It should be noted that the process of acquiring the above-mentioned operational behavior data and terminal environment parameters is not a simple summary of system logs. Instead, it introduces time synchronization, environmental trigger point marking, and a joint modeling mechanism for behavior and environment, so that the behavioral characteristics can truly reflect the interaction between user operations and terminal and network physical conditions. This lays the foundation for distinguishing between behavioral fluctuations caused by environmental changes and behavioral changes driven by abnormal user intentions.
[0016] In this embodiment of the invention, the process of decoupling the operational behavior data from environmental conditions based on the terminal environment parameters and constructing a standardized behavioral feature set that eliminates the impact of terminal performance differences and network latency is as follows: Based on the acquired behavioral environment associated samples, the terminal type, network connection method and network latency status are discretized to form an environmental condition description vector, which is used to characterize the external technical environment status when the user's operation behavior occurs. Based on the environmental condition description vector, the timing of the operation behavior is divided into several environmentally consistent behavior segments, so that the operation behavior in each behavior segment is under relatively stable terminal performance and network conditions, thereby avoiding the distortion of behavior characteristics caused by the direct mixing of different environmental conditions. For consistent behavior segments across different environments, basic behavioral characteristics such as operation time interval, instruction call density, and access path length are statistically analyzed. Network latency and terminal performance factors are introduced to perform environmental compensation correction on the basic behavioral characteristics in order to eliminate the disturbance effect of non-business factors on the behavioral characteristics. For example, for the first The characteristic of the operation time interval within a single environmentally consistent behavior segment, after environmental compensation and standardization, can be expressed as follows: ,in, This indicates the characteristics of the operation time interval after environmental compensation. Indicates the characteristics of the original operation time interval. This represents the network latency characteristic within the corresponding segment. This represents the terminal performance constraint factor. and These are the preset weighting coefficients for network latency characteristics and terminal performance constraint factors, respectively. and All are greater than 0, and satisfy ; It should be noted that, and The settings should be tailored to the specific circumstances. For example, an expert-empowered approach could be adopted, where experts in relevant fields are invited to determine the pre-defined proportions for each indicator through professional opinion surveys and comprehensive evaluations. and The initial value can be 0.5, 0.5; It should be noted that, The terminal performance constraint factor is used to characterize the comprehensive constraint of terminal hardware performance, system resource occupancy status and operating environment on user operation response capability in the k-th environmental consistent behavior segment. The terminal performance constraint factor is used to characterize the operation delay and behavior rhythm changes introduced by non-business factors such as insufficient terminal computing power, increased system load or limited virtualization resources, thereby avoiding misjudging the slowdown of operation caused by terminal performance differences as abnormal user behavior. For example, terminal performance constraint factor It can be calculated based on the system resource occupancy status of the terminal within the corresponding behavior observation window, and its calculation expression can be expressed as: ,in, Indicates the first Normalized average load rate of terminal processors within a single environmentally consistent behavior segment Indicates the first Normalized memory usage within a single environment-consistent behavior segment This represents the normalized representation of terminal input / output resource usage. The same environmental compensation method is used for instruction call density and data access path characteristics. Behavioral feature components are constructed to eliminate the differences in terminal response capabilities and the impact of network fluctuations, so that the behavioral features obtained under different terminal and network conditions are comparable in the same feature space. After completing the environmental compensation correction, the behavioral feature components are normalized to map the behavioral features of different dimensions to a unified numerical range, thus obtaining a standardized behavioral feature vector. The standardized behavioral feature vectors obtained from different behavioral segments are concatenated in chronological order to construct a standardized behavioral feature set across terminals and network environments, which is used for subsequent behavioral structure drift analysis and user intent shift assessment. It should be noted that the environmental condition decoupling process is not a simple data normalization or static weighting, but rather a targeted compensation for behavioral characteristics by introducing terminal performance factors and network latency factors, so that standardized behavioral characteristics can more realistically reflect the user's operational intentions, thereby reducing the risk of false anomaly judgments caused by differences in terminal and network environments.
[0017] In this embodiment of the invention, the process of extracting behavioral structural features representing changes in user operation rhythm, command granularity fluctuations, and access path evolution based on the standardized behavioral feature set, and constructing a behavioral drift description set under multiple terminal and multiple environment conditions, is as follows: Based on the standardized behavioral feature set, user operation behavior is continuously divided according to time sequence to form multiple interconnected behavioral analysis windows, which are used to depict the structural state of user operation behavior in different time periods. Within each behavior analysis window, statistical analysis is performed on the standardized operation time interval sequence to extract operation rhythm fluctuation characteristics, including operation rhythm fluctuation degree, to characterize the stability and changing trend of user operation rhythm. For example, in the first Within each behavior analysis window, the fluctuation of the operation rhythm It can be represented as: ,in, This represents the standardized time interval for the i-th operation within the window. This represents the average value of the standardized time intervals within the window. This represents the number of operations performed within the window. For the command granularity change characteristics, the distribution of different types of instructions and interface calls is statistically analyzed in each behavior analysis window, and the command granularity ratio of high-granularity operation and low-granularity operation is calculated to depict the structural transformation of user operation from fine interaction to batch or scripted operation. For example, user operation commands and interface calls are divided into high-granularity operations and low-granularity operations according to their scope and execution impact. High-granularity operations are used to characterize fine-grained interactive behaviors on a single object or a small amount of data, while low-granularity operations are used to characterize centralized operational behaviors on multiple objects, batch data, or automated processes. Within each behavior analysis window, the number of high-granularity operations is counted separately. With low-granularity operation number And calculate the corresponding command granularity percentage. ,in, This is used to characterize the proportion of low-granularity, batch operations within the overall operational behavior in the behavior analysis window; The user data access path is serialized and compared between behavior analysis windows to extract access path evolution features, including access path evolution distance, in order to characterize the degree of change of access path over time. For example, two adjacent behavior analysis windows and Access path evolution distance It can be represented as: ,in, and These represent the behavior analysis window. and The set of access paths within; The characteristics of operation rhythm fluctuation, command granularity change, and access path evolution are combined to construct the behavior structure feature vector of the corresponding behavior analysis window; By performing temporal fusion on the behavioral structure feature vectors in the continuous behavior analysis window, a behavioral structure change trajectory under multiple terminal and multiple environment conditions is formed, resulting in a behavioral drift description set, which is used to comprehensively depict the evolution trend of user operation behavior at the temporal and structural levels. It should be noted that the behavioral drift description set is not an isolated comparison of a single behavioral feature, but rather a comprehensive reflection of the synergistic changes in operation rhythm, command granularity, and access path structure. Its technical significance lies in providing a structured criterion basis for distinguishing between environmentally induced normal behavioral drift and risky behavioral drift driven by abnormal intentions.
[0018] In this embodiment of the invention, based on the behavior drift description set, the process of distinguishing between the behavior structure shift component caused by changes in the terminal environment and the behavior shift component caused by changes in user operation intent, and calculating the proportion of environment-induced behavior drift, is as follows: Based on the behavior drift description set, the behavior structure feature vectors between adjacent windows are differentially processed according to the time order of the behavior analysis windows to obtain the total behavior drift vector used to characterize the magnitude of behavior structure change. The total behavioral drift vector is jointly analyzed with the changes in terminal environmental parameters within the corresponding window to construct a behavioral response reference vector driven by environmental changes. This vector is used to characterize the typical impact patterns of environmental factors such as changes in terminal performance, network status fluctuations, and input method switching on behavioral structure. The behavior response reference vector The construction process includes: obtaining the changes in terminal type, network connection status, network latency level and input method identifier within adjacent behavior analysis windows, and mapping the changes in environmental parameters to the same feature space as the behavior structure feature vector, so that environmental changes form corresponding response representations in the behavior structure space; By calculating the projection component of the total behavior drift vector in the direction of the behavior response reference vector driven by environmental change, the environmental-induced behavior drift component caused by the change in the terminal environment is extracted. For example, in the first In each behavioral analysis window, the environmentally induced behavioral drift component It can be represented as: ,in, This represents the total behavior drift vector for the corresponding window. This represents a behavioral response reference vector constructed from changes in terminal environment parameters. Represents the dot product of vectors; The proportion of environmentally induced behavioral drift is calculated based on the magnitude relationship between the environmentally induced behavioral drift component and the total behavioral drift vector. For example, the proportion of environmentally induced behavioral drift It can be represented as: ,in, Represents the vector norm; By performing time-series fusion of the proportion of environmentally induced behavior drift within the continuous behavior analysis window, a sequence of environmentally induced behavior drift proportions is formed, which is used to reflect the degree to which changes in user behavior structure are dominated by environmental factors under multi-terminal and multi-environment conditions. It should be noted that the behavioral drift component differentiation process is not based on static thresholds or single feature judgments, but rather on quantitative analysis of the correlation between the direction of behavioral structure change and the direction of environmental change, so that environment-induced behavioral drift and user intent-driven behavioral drift are separable in the structural space, thereby providing a more reliable input basis for subsequent abnormal behavior risk assessment.
[0019] In this embodiment of the invention, the process of calculating the user behavior intent offset after removing environmental influences, based on the proportion of environmentally induced behavior drift and the user's historical behavior baseline, is as follows: Based on the environmental-induced behavior drift ratio sequence, the total behavior drift vector within the user's current behavior analysis window is subjected to environmental impact reduction processing to obtain a net behavior change vector that has been initially removed from environmental disturbances. This vector is used to characterize the behavior change trend under the current terminal and network conditions that is not directly driven by environmental changes. For example, in the first Within each behavioral analysis window, the net behavioral change vector after environmental impact reduction. It can be represented as: ,in, This represents the total behavior drift vector for the corresponding window. This indicates the percentage of environmentally induced behavioral drift within the window; After obtaining the net behavior change vector, it is aligned and compared with the user's historical behavior baseline; The user historical behavior baseline is composed of the statistical distribution of behavioral structure characteristics formed by users in a long-term stable business environment, and is used to characterize the typical behavioral intention range of users under normal business operation conditions. By calculating the degree of deviation of the net behavior change vector relative to the historical behavior baseline, the user behavior intention offset, which represents the difference between the user's current behavior and its historical stable behavior pattern, is extracted. For example, user behavior intent offset It can be represented as: ,in, This represents the mean behavior vector within the baseline of user historical behavior. Denotes the corresponding covariance matrix. This represents a weighted distance metric under baseline covariance constraints. The user behavior intent offset obtained in the continuous behavior analysis window is smoothed to reduce the impact of occasional operations or short-term business disturbances on the offset calculation results, thus forming a stable user behavior intent offset sequence. It should be noted that the user's historical behavior baseline is not fixed, but is continuously updated and adaptively adjusted from long-term stable behavior segments to reflect the evolution of the user's responsibilities and business habits, thereby avoiding misjudging normal changes in responsibilities as abnormal intent shifts. Through the above process, the user behavior intent shift amount after removing the influence of terminal environment and network conditions is obtained, which is used as one of the core criteria for subsequent internal identification of abnormal user behavior and data leakage risk assessment.
[0020] In this embodiment of the invention, based on the behavioral intent offset and combined with data access depth, access frequency, and cross-system correlation, an internal user abnormal behavior risk index is calculated, and a data leakage risk assessment result is generated. The process of accurately monitoring internal user abnormal and data leakage behaviors in multi-terminal, multi-environment switching scenarios is as follows: Based on the user behavior intent offset sequence, data access behavior characteristics corresponding to the current behavior analysis window are obtained, including the number of data accesses per unit time, the depth of data layers involved in a single access, and the access association across different business systems, which are used to characterize the user's data contact intensity and coverage at the current stage. The data access behavior characteristics are structured to form access frequency characteristics, access depth characteristics, and cross-system switching frequency characteristics. The access depth characteristics reflect the user's trend of accessing data objects from surface information to core sensitive data, while the cross-system association characteristics reflect the user's behavior of horizontally integrating data between different systems. For example, the frequency of cross-system switching across different business systems It can be represented as: ,in, This represents the total number of visits per unit of time. This indicates the number of accesses involving different systems; The user behavior intent offset is jointly modeled with access frequency features, access depth features, and cross-system switching frequency features to construct an internal user abnormal behavior risk assessment function and calculate an internal user abnormal behavior risk index to quantify the degree of abnormality of the user's current behavior relative to its normal business behavior. For example, in the first Within each behavior analysis window, the risk index of abnormal user behavior is displayed. It can be represented as: ,in, This represents the offset of the user's intended behavior within the corresponding window. This represents the access frequency characteristic. This represents the data access depth feature. This represents a characteristic quantity indicating the frequency of cross-system handover. , , , These represent the preset weighting coefficients for user behavior intent offset, access frequency feature, access depth feature, and cross-system switching frequency feature, respectively. , , , All are greater than 0, and satisfy ; It should be noted that, , , , The settings should be tailored to the specific circumstances. For example, an expert-empowered approach could be adopted, where experts in relevant fields are invited to determine the pre-defined proportions for each indicator through professional opinion surveys and comprehensive evaluations. , , , The initial value can be 0.25, 0.25, 0.25, 0.25; Based on the comparison between the internal user abnormal behavior risk index and the preset internal user abnormal behavior risk index threshold, a data leakage risk assessment result is generated: When the internal user abnormal behavior risk index is greater than or equal to the internal user abnormal behavior risk index threshold, the current internal user behavior is determined to be in a high-risk abnormal state, and a result of high-risk data leakage behavior of internal users is generated. The reason is that after removing the influence of terminal performance differences and network environment, the user's behavioral intention deviation is still significantly higher than its historical behavior baseline. Moreover, this behavioral intention deviation is highly consistent with the abnormal increase in data access depth, the significant increase in access frequency per unit time, and the expansion of cross-system related access scope. This indicates that the current user's operation behavior has deviated from its normal business operation intention and has the behavioral characteristics of acquiring or leaking sensitive data for abnormal purposes. When the internal user abnormal behavior risk index is less than the internal user abnormal behavior risk index threshold, the current internal user behavior is determined to be in a controllable or low-risk state, and a low-risk data leakage assessment result is generated. The reason is that the user's behavioral intention offset in the standardized behavioral feature space is within the range of historical behavioral fluctuations, and its data access depth, access frequency and cross-system correlation degree are consistent with the existing business behavior pattern, and there is no abnormal data concentration access or abnormal diffusion trend.
[0021] This invention introduces terminal environment parameters during the behavior modeling stage and decouples operational behavior data from environmental conditions. This effectively separates non-business factors such as terminal performance differences, network latency fluctuations, and input method changes from user behavior characteristics. Under actual operating conditions with frequent switching between multiple terminals and network environments, user behavior characteristics can be mapped to a unified and comparable standardized feature space, thus avoiding overall behavioral baseline drift caused by changes in objective technical conditions. By extracting behavioral structural features reflecting the evolution of operation rhythm, command granularity, and access paths, and constructing a behavior drift description set, this invention quantitatively distinguishes between environment-induced structural shifts and changes in actual user operational intent, enabling accurate... Identify which behavioral fluctuations originate from changes in terminal and network conditions, and which behavioral deviations stem from potential abnormal or risky intentions; calculate the behavioral intention deviation after removing environmental influences by combining the user's historical behavior baseline, and integrate this deviation with risk-sensitive indicators such as data access depth, access frequency, and cross-system correlation to form an internal user abnormal behavior risk index with a clear causal explanation path. This significantly reduces the probability of false alarms caused by environmental changes in complex office environments and scenarios with widespread automated operations, while improving the ability to identify covert and gradual data leakage behaviors, achieving stable, accurate, and explainable monitoring of internal user abnormal behavior and data leakage risks.
[0022] The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.
[0023] It should be understood that in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0024] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. An internal user anomaly and data leakage monitoring method based on AI behavior analysis, characterized in that: Includes the following steps: Acquire operational behavior data of internal users under different terminal forms and network environments, as well as the corresponding terminal environment parameters; Based on the terminal environment parameters, the operation behavior data is decoupled from the environmental conditions to construct a standardized behavior feature set; Based on the standardized behavioral feature set, behavioral structural features representing changes in user operation rhythm, fluctuations in command granularity, and evolution of access paths are extracted to construct a behavioral drift description set. Based on the behavior drift description set, the behavioral structure shift component caused by changes in the terminal environment and the behavioral shift component caused by changes in user operation intention are distinguished, and the proportion of environment-induced behavior drift is calculated. Based on the proportion of environmentally induced behavior drift and the baseline of user's historical behavior, calculate the user's behavioral intent offset after removing the environmental influence; Based on the behavioral intent offset, and combined with data access depth, access frequency and cross-system correlation, the internal user abnormal behavior risk index is calculated, and a data leakage risk assessment result is generated.
2. The method for monitoring internal user anomalies and data leaks based on AI behavior analysis according to claim 1, characterized in that: The process of acquiring operational behavior data of internal users under different terminal types and network environments, as well as the corresponding terminal environment parameters, is as follows: Deploy behavior audit collection modules and environmental status awareness modules at the access nodes of the enterprise's internal business systems, data systems, and operation and maintenance systems to continuously collect user operation behavior data and their corresponding terminal environment parameters during normal user business operations and terminal environment switching. Set a uniform behavior sampling granularity for the collected operational behavior data and terminal environment parameters, and attach a precise timestamp and environment identifier to each operational behavior; When terminal switching, network state change and operation mode change events are detected, the corresponding time is marked as the environment switching trigger point. Continuous observation windows are set before and after the trigger point to jointly sample user operation behavior and environmental parameters to obtain a complete behavior evolution sequence covering the stable stage before environment switching, the switching transition stage and the adaptation stage after switching. The behavior evolution sequence is jointly mapped with the terminal environment parameters at the corresponding time to construct a behavior-environment related sample; Based on the behavioral environment associated samples, the timing of the operation behavior is processed by a sliding window to extract behavioral representation parameters such as the operation time interval distribution, instruction call density and access path length, forming basic behavioral feature components. Multiple behavioral feature components obtained under the same terminal environment are combined to construct a corresponding subset of operational behavioral features, which together with the subsets of behavioral features formed under different terminal forms and network environment conditions constitute an operational behavioral dataset under multiple terminal and multiple environment conditions.
3. The method for monitoring internal user anomalies and data leaks based on AI behavior analysis according to claim 2, characterized in that: The process of decoupling the operational behavior data from environmental conditions and constructing a standardized behavior feature set based on the terminal environment parameters is as follows: Based on the acquired behavioral environment-related samples, the terminal type, network connection method, and network latency status are discretized to form an environmental condition description vector; Based on the environmental condition description vector, the operation behavior sequence is divided into several environmentally consistent behavior segments; For consistent behavior segments in various environments, the basic behavioral characteristics of operation time interval, instruction call density and access path length are statistically analyzed, and network latency and terminal performance factors are introduced to perform environmental compensation correction on the basic behavioral characteristics. After completing the environmental compensation correction, the behavioral feature components are normalized to map the behavioral features of different dimensions to a unified numerical range, thus obtaining a standardized behavioral feature vector. The standardized behavioral feature vectors obtained from different behavioral segments are concatenated in chronological order to construct a standardized behavioral feature set that spans multiple terminals and network environments.
4. The method for monitoring internal user anomalies and data leaks based on AI behavior analysis according to claim 3, characterized in that: Based on the standardized behavioral feature set, the process of extracting behavioral structural features that characterize changes in user operation rhythm, fluctuations in command granularity, and evolution of access paths, and constructing a behavioral drift description set, is as follows: Based on the standardized behavioral feature set, user operation behaviors are continuously divided in chronological order to form multiple interconnected behavioral analysis windows; Within each behavior analysis window, statistical analysis is performed on the standardized operation time interval sequence to extract operation rhythm fluctuation characteristics, including the degree of operation rhythm fluctuation. For the command granularity variation characteristics, the distribution of different types of instructions and interface calls is statistically analyzed in each behavior analysis window, and the command granularity ratio of high-granularity operations and low-granularity operations is calculated. The user data access path is serialized and compared between behavior analysis windows to extract access path evolution features, including access path evolution distance. The characteristics of operation rhythm fluctuation, command granularity change, and access path evolution are combined to construct the behavior structure feature vector of the corresponding behavior analysis window; By performing temporal fusion on the behavioral structure feature vectors in the continuous behavior analysis window, a behavior drift description set is obtained.
5. The method for monitoring internal user anomalies and data leaks based on AI behavior analysis according to claim 4, characterized in that: Based on the aforementioned behavior drift description set, the process of distinguishing between the behavior structure shift component caused by changes in the terminal environment and the behavior shift component caused by changes in user operation intent, and calculating the proportion of environment-induced behavior drift, is as follows: Based on the behavior drift description set, the behavior structure feature vectors between adjacent windows are differentially processed according to the time order of the behavior analysis windows to obtain the total behavior drift vector. The total behavior drift vector is jointly analyzed with the changes in terminal environment parameters within the corresponding window to construct a behavior response reference vector driven by environmental changes. By calculating the projection component of the total behavior drift vector onto the direction of the behavior response reference vector driven by environmental change, the environment-induced behavior drift component is extracted. The proportion of environmentally induced behavioral drift is calculated based on the magnitude relationship between the environmentally induced behavioral drift component and the total behavioral drift vector. By performing time-series fusion of the proportion of environmentally induced behavior drift within the continuous behavior analysis window, a sequence of environmentally induced behavior drift proportions is formed.
6. The method for monitoring internal user anomalies and data leaks based on AI behavior analysis according to claim 5, characterized in that: The process of calculating the user behavior intent offset after removing environmental influences, based on the aforementioned environmental-induced behavior drift ratio and the user's historical behavior baseline, is as follows: Based on the environmental-induced behavior drift ratio sequence, the total behavior drift vector within the user's current behavior analysis window is subjected to environmental impact reduction processing to obtain a net behavior change vector that has been preliminarily removed from environmental disturbances. After obtaining the net behavior change vector, it is aligned and compared with the user's historical behavior baseline. By calculating the degree of deviation of the net behavior change vector from the historical behavior baseline, the user's behavioral intent offset is extracted. The user behavior intent offset obtained from the continuous behavior analysis window is smoothed to form a user behavior intent offset sequence.
7. The method for monitoring internal user anomalies and data leaks based on AI behavior analysis according to claim 6, characterized in that: The process of calculating the internal user abnormal behavior risk index and generating data leakage risk assessment results based on the behavioral intent offset, combined with data access depth, access frequency, and cross-system correlation, is as follows: Based on the user behavior intent offset sequence, data access behavior features corresponding to the current behavior analysis window are obtained, and the data access behavior features are structured to form access frequency feature, access depth feature and cross-system switching frequency feature respectively. The user behavior intent offset is jointly modeled with the access frequency feature, access depth feature and cross-system switching frequency feature to construct an internal user abnormal behavior risk assessment function and calculate the internal user abnormal behavior risk index. Based on the comparison between the internal user abnormal behavior risk index and the preset internal user abnormal behavior risk index threshold, a data leakage risk assessment result is generated: When the internal user abnormal behavior risk index is greater than or equal to the internal user abnormal behavior risk index threshold, the current internal user behavior is determined to be in a high-risk abnormal state, and a result is generated indicating that the internal user has a high risk of data leakage. When the risk index of abnormal internal user behavior is less than the threshold of the risk index of abnormal internal user behavior, the current internal user behavior is determined to be in a low-risk state, and a low-risk data leakage assessment result is generated.