A large-scale network-oriented traffic real-time monitoring analysis method and system
By constructing a global observation matrix using dynamic time slicing and state transition codes, the problem of locating abnormal traffic events in large-scale networks is solved. This enables accurate location and source tracing without relying on prior knowledge, and enhances the sensitivity and accuracy of anomaly identification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI FUHUA NETWORK TECH CO LTD
- Filing Date
- 2026-05-28
- Publication Date
- 2026-06-26
AI Technical Summary
Existing traffic monitoring and analysis methods struggle to accurately identify abnormal traffic events in large-scale networks without relying on prior knowledge of attacks. In particular, they are easily overwhelmed by normal background traffic during periods of low traffic density and short-term, sudden abnormal behavior. Furthermore, traditional methods ignore changes in packet length and arrival intervals between adjacent data packets within the traffic flow, leading to failure in anomaly localization.
Dynamic time segmentation is performed based on the five-tuple information and timestamps of traffic data packets to construct state transition codes. The direction of packet length change and the direction of arrival interval change are used to characterize traffic behavior patterns. A global observation matrix is constructed at the central analysis node, and node state deviation analysis is performed in combination with the historical baseline feature matrix to achieve the location constraint of abnormal traffic events.
Without relying on prior knowledge, it can accurately segment traffic behavior segments, capture hidden anomaly features, enhance the sensitivity of anomaly identification, realize global observation and rapid source tracing of large-scale networks, reduce the false judgment rate, and meet the needs of traffic operation and maintenance.
Smart Images

Figure CN122293501A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of traffic monitoring and analysis technology, and more specifically, to a method and system for real-time traffic monitoring and analysis for large-scale networks. Background Technology
[0002] In today's highly digitalized business environment, traffic data has become a core indicator for measuring business health. With the popularization of microservices and cloud-native architectures, a single user request may span dozens of distributed nodes. Traditional monitoring methods based on logs or single-point counting are no longer sufficient to meet the real-time and accuracy requirements of massive high-concurrency scenarios. At the same time, malicious crawlers and sudden traffic attacks exacerbate the risk of data distortion. Therefore, modern traffic monitoring and analysis needs to build an integrated technical system that covers data collection, real-time calculation, intelligent verification, and visualization to achieve second-level insight into business fluctuations and automatic alarms for anomalies, providing decision support for system stability and accurate operation.
[0003] In existing traffic monitoring and analysis, the process begins by embedding data points at network entry and exit points or in application code to collect raw access logs or request records. The system then aggregates the data by time window, calculates key indicators, and further decomposes the traffic based on these indicators to identify traffic hotspots and high-frequency interfaces. By comparing this data with historical data from the same period, the system can detect abnormal fluctuations such as sudden increases or decreases in traffic or spikes in error rates, thus completing the monitoring and analysis. However, existing network traffic monitoring methods for large-scale networks, especially in identifying abnormal traffic events, generally rely on prior attack knowledge. This approach suffers from several problems: First, macroscopic traffic characteristics are easily overwhelmed by normal background traffic when faced with low-density, short-term, sudden abnormal behavior. Second, methods based on fixed time windows or packet number segmentation ignore the directional dependence of adjacent data packets within the traffic flow in terms of packet length and arrival interval, making it impossible to capture microscopic sequence patterns generated by specific attack techniques. This leads to the failure to locate abnormal traffic events in large-scale network traffic. Therefore, how to achieve location constraints for abnormal traffic events without relying on prior attack knowledge has become a challenge for the industry. Summary of the Invention
[0004] This application provides a method and system for real-time traffic monitoring and analysis for large-scale networks, which can locate and constrain abnormal traffic events without relying on prior knowledge of attacks.
[0005] In a first aspect, this application provides a method for real-time traffic monitoring and analysis of large-scale networks, comprising the following steps: After acquiring the traffic data packets of each network node, dynamic time segmentation is performed based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments. For each traffic behavior segment, the packet length change direction and arrival interval change direction between adjacent traffic data packets are calculated based on the packet length sequence and arrival interval sequence corresponding to the traffic behavior segment. A state transition code characterizing the micro-behavioral pattern within the traffic behavior segment is constructed using the packet length change direction and the arrival interval change direction. The state transition codes generated by each network node within the same time window are aggregated and sent to the central analysis node to construct the global network analysis space, so as to obtain the global observation matrix. Based on the global observation matrix of the current time window and the baseline feature matrix composed of historical normal traffic state transition codes, node state deviation analysis is performed on each network node to obtain the state deviation vector corresponding to each network node. The abnormal traffic events within the current time window are located and constrained based on the projection components of the state deviation vectors of each network node in the global network analysis space.
[0006] In some embodiments, dynamic time segmentation is performed based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments, specifically including: Extract the 5-tuple information and timestamp of each traffic data packet to generate the initial traffic record; Each initial traffic record is hashed and split according to the 5-tuple information, so that initial traffic records with the same 5-tuple are grouped into the same processing queue. After sorting the initial traffic records in the same processing queue according to their timestamps, the timestamp of the initial traffic record at the head of the queue is taken as the starting time. The difference between each subsequent timestamp and the starting time is compared with a preset time window benchmark value. When the difference exceeds the time window benchmark value for the first time, all the initial traffic records that have been included in the current time window are encapsulated into a traffic behavior fragment, and the initial traffic record that triggered the exceedance is taken as the new starting time to start the aggregation of the next time window. Continue processing all initial traffic records in the same processing queue until all initial traffic records have been allocated to obtain multiple traffic behavior fragments.
[0007] In some embodiments, hashing and splitting the initial traffic records according to their 5-tuple information, so that initial traffic records with the same 5-tuple are grouped into the same processing queue, specifically includes: Hash the 5-tuple information in each initial traffic record to obtain a fixed-length 5-tuple hash value; Using the quintuple hash value as the input key, the initial traffic record is mapped to a hash ring consisting of multiple processing units. The target processing unit is determined by taking the modulo of the fixed-length quintuple hash value with the total number of processing units. Within each processing unit, the 5-tuple hash value carried by the initial traffic record received by the target processing unit is used as the queue index key, and initial traffic records with the same queue index key are grouped into the same processing queue.
[0008] In some embodiments, calculating the direction of packet length change and the direction of arrival interval change between adjacent traffic data packets based on the packet length sequence and arrival interval sequence corresponding to the traffic behavior segment specifically includes: Obtain the packet length sequence and arrival interval sequence arranged in the order of arrival of traffic data packets from the traffic behavior segment; For the packet length sequence, the difference between the second packet length and the first packet length in two adjacent packets is calculated in turn, and the direction of the packet length is identified based on the difference, thereby converting the packet length sequence into a packet length change direction sequence; For the arrival interval sequence, the difference between the next arrival interval and the previous arrival interval is calculated sequentially between two adjacent arrival intervals. The direction of the interval is identified based on the difference, thereby converting the arrival interval sequence into an arrival interval change direction sequence.
[0009] In some embodiments, constructing a state transition code characterizing the micro-behavioral pattern within the traffic behavior segment using the packet length change direction and the arrival interval change direction specifically includes: Obtain the packet length change direction sequence and the arrival interval change direction sequence; From the packet length change direction sequence and the arrival interval change direction sequence, the packet length change direction identifier and the arrival interval change direction identifier at the same sequence position are extracted sequentially. The packet length change direction identifier and the arrival interval change direction identifier are combined into a tuple, and the tuple is used as the micro-behavioral state symbol at the sequence position. The micro-behavioral state symbols obtained from all sequence positions within the traffic behavior segment are concatenated in chronological order of the original sequence to form a state symbol sequence, and the state symbol sequence is used as the state transition code representing the micro-behavioral pattern within the traffic behavior segment.
[0010] In some embodiments, the state transition codes generated by each network node within the same time window are aggregated to the central analysis node to construct the global network analysis space, thereby obtaining the global observation matrix. Specifically, this includes: The central analysis node receives state transition codes reported by each network node, which carry node identifiers and time window identifiers. The received state transition codes are aggregated according to the time window identifier to obtain the set of state transition codes of each node under the current time window to be analyzed, with the node identifier as the grouping key; A global observation matrix is constructed using the current time window to be analyzed as the matrix index, the network node identifiers participating in the reporting as the row index, and the enumeration results of all dissimilar state transition codes extracted from the state transition code sets of each node as the column index.
[0011] In some embodiments, traffic data packets of each network node are obtained through a bypass traffic probe.
[0012] Secondly, this application provides a real-time traffic monitoring and analysis system for large-scale networks, used to perform a real-time traffic monitoring and analysis method for large-scale networks. The system includes: The acquisition module is used to dynamically time-slice the traffic data packets of each network node based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments. The processing module is used to calculate the packet length change direction and arrival interval change direction between adjacent traffic data packets based on the packet length sequence and arrival interval sequence corresponding to each traffic behavior segment, and to construct a state transition code that characterizes the micro-behavioral pattern within the traffic behavior segment through the packet length change direction and the arrival interval change direction. The processing module is also used to aggregate the state transition codes generated by each network node within the same time window to the central analysis node to construct the global network analysis space, so as to obtain the global observation matrix. The processing module is also used to perform node state deviation analysis on each network node based on the global observation matrix of the current time window and the baseline feature matrix composed of historical normal traffic state transition codes, so as to obtain the state deviation vector corresponding to each network node. The execution module is used to locate and constrain abnormal traffic events within the current time window based on the projection components of the state deviation vectors of each network node in the global network analysis space.
[0013] Thirdly, this application provides a computer device including a memory and a processor, the memory storing code, and the processor being configured to acquire the code and execute the above-described method for real-time traffic monitoring and analysis for large-scale networks.
[0014] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method for real-time traffic monitoring and analysis for large-scale networks.
[0015] The technical solutions provided by the embodiments disclosed in this application have the following beneficial effects: The method and system for real-time traffic monitoring and analysis of large-scale networks provided in this application firstly acquires traffic data packets from each network node, and then performs dynamic time segmentation based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments. Secondly, for each traffic behavior segment, the direction of packet length change and the direction of arrival interval change between adjacent traffic data packets are calculated based on the packet length sequence and arrival interval sequence corresponding to the traffic behavior segment, and a state transition code representing the micro-behavioral pattern within the traffic behavior segment is constructed using the packet length change direction and the arrival interval change direction. Further, the state transition codes generated by each network node within the same time window are aggregated to a central analysis node to construct a global network analysis space to obtain a global observation matrix. Then, based on the global observation matrix of the current time window and the baseline feature matrix composed of historical normal traffic state transition codes, node state deviation analysis is performed on each network node to obtain the state deviation vector corresponding to each network node. Finally, the abnormal traffic events within the current time window are located and constrained based on the projection components of the state deviation vector corresponding to each network node in the global network analysis space.
[0016] Therefore, this application can achieve the location constraint of abnormal traffic events without relying on prior attack knowledge. First, by dynamically segmenting traffic behavior fragments based on the five-tuple of traffic data packets and timestamps, it can accurately divide continuous traffic data of the same business flow, avoiding the fragmentation of behavioral features caused by fixed window cutting, and adapting to the traffic collection needs of large-scale networks with high concurrency and distributed nodes. Second, by calculating the change direction based on the packet length sequence and arrival interval sequence and constructing state transition codes, it can extract the inherent behavior pattern of traffic transmission at the micro level, capture hidden abnormal features without relying on prior attack knowledge, solve the problem that traditional macro features are easily submerged by normal traffic, and enhance the sensitivity of anomaly identification. Furthermore, the state transition codes of each node are summarized and a global observation matrix is constructed. The system integrates the local behavioral characteristics of scattered nodes into a unified analysis space for the entire network, enabling global observation of large-scale networks and breaking the limitations of single-point monitoring. Then, based on the global observation matrix and baseline feature matrix, state deviation analysis is performed to obtain a state deviation vector. Through quantitative comparison of real-time behavior with historical normal baselines, the degree of abnormal deviation of each node can be accurately quantified, reducing the false positive rate. Finally, abnormal traffic location constraints are applied based on the projection components of the state deviation vector, enabling rapid identification of the initiating node and reconstruction of the propagation path. This achieves accurate location and rapid source tracing of abnormal traffic events, meeting the traffic operation and maintenance needs of large-scale networks. In summary, the technical solution provided in this application can achieve location constraints for abnormal traffic events without relying on prior attack knowledge. Attached Figure Description
[0017] Figure 1 This is a schematic diagram of an application scenario architecture where network nodes aggregate data to a central analysis node, as shown in some embodiments of this application. Figure 2 This is an exemplary flowchart of a real-time traffic monitoring and analysis method for large-scale networks, as shown in some embodiments of this application. Figure 3 This is an exemplary flowchart illustrating the determination of state transition codes according to some embodiments of this application; Figure 4 This is a schematic diagram illustrating the generation of state symbol sequences in some embodiments of this application; Figure 5 This is a schematic diagram of the structure of a real-time traffic monitoring and analysis system for large-scale networks, according to some embodiments of this application; Figure 6 This is a schematic diagram of the structure of a computer device that implements a method for real-time traffic monitoring and analysis for large-scale networks, according to some embodiments of this application. Detailed Implementation
[0018] To better understand the technical solution of this application, the technical solution of this application will be described in detail below with reference to the accompanying drawings and specific embodiments.
[0019] refer to Figure 1 This figure is a schematic diagram of an application scenario architecture in which network nodes aggregate data to a central analysis node according to some embodiments of this application. The application scenario architecture includes network nodes and a central analysis node. Network nodes 1 to n on the left side of the figure are distributed traffic acquisition and preprocessing units. Each node is responsible for dynamically time-slicing local traffic data packets based on the five-tuple and timestamp to obtain traffic behavior segments, and constructing state transition codes based on the packet length and arrival interval change direction. Then, the state transition codes carrying node identifiers and time window identifiers are reported to the central analysis node on the right side through the communication link shown by the dashed line. The central analysis node, as the core of global data aggregation and processing, is responsible for receiving the state transition codes reported by each node, aggregating them according to the aligned time window of fixed length, constructing a global observation matrix, and performing node state deviation analysis in combination with the baseline feature matrix, thereby realizing the location constraint of abnormal traffic events.
[0020] refer to Figure 2 This figure is an exemplary flowchart of a real-time traffic monitoring and analysis method for large-scale networks according to some embodiments of this application. The figure mainly includes the following steps: In step S101, after acquiring the traffic data packets of each network node, dynamic time segmentation is performed based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments.
[0021] In practice, a bypass traffic probe is deployed at each monitored network node to acquire traffic data packets from each network node. The bypass traffic probe is a non-intrusive network traffic acquisition device. The traffic data packets include a five-tuple (i.e., source IP address, source port, destination IP address, destination port, and transport layer protocol), timestamp, packet length, and arrival interval. The network nodes include, but are not limited to, routers, aggregation switches, server hosts, container hosts, and virtual switches.
[0022] It should be noted that, in this application, traffic data packets refer to the raw network protocol data units that the probe captures in real time on the network node and that carry user communication behavior.
[0023] In some embodiments, dynamic time segmentation based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments is achieved through the following steps: Extract the 5-tuple information and timestamp of each traffic data packet to generate the initial traffic record; Each initial traffic record is hashed and split according to the 5-tuple information, so that initial traffic records with the same 5-tuple are grouped into the same processing queue. After sorting the initial traffic records in the same processing queue according to their timestamps, the timestamp of the initial traffic record at the head of the queue is taken as the starting time. The difference between each subsequent timestamp and the starting time is compared with a preset time window benchmark value. When the difference exceeds the time window benchmark value for the first time, all the initial traffic records that have been included in the current time window are encapsulated into a traffic behavior fragment, and the initial traffic record that triggered the exceedance is taken as the new starting time to start the aggregation of the next time window. Continue processing all initial traffic records in the same processing queue until all initial traffic records have been allocated to obtain multiple traffic behavior fragments.
[0024] In specific implementation, firstly, each network node captures raw traffic data packets via a bypass traffic probe. From each traffic data packet, a five-tuple information consisting of the source IP address, source port, destination IP address, destination port, and transport layer protocol, as well as the timestamp information of the arrival time of the traffic data packet, is extracted. The five-tuple information and the timestamp information are combined to form an initial traffic record, which is a combination of the identifier and time attribute data of a single traffic data packet. Secondly, each initial traffic record is hashed and split according to the five-tuple information, so that initial traffic records with the same five-tuple are grouped into the same processing queue. Then, after hashing and splitting, all initial traffic records in the same processing queue are sorted in ascending order according to the timestamp value. After sorting, the timestamp of the first initial traffic record in the queue is used as the starting time of the current time window, and subsequent records in the queue are traversed sequentially. The process begins by recording initial traffic data. The time difference between the timestamp of each subsequent record and the current starting point is calculated sequentially. This time difference is then compared to a preset time window benchmark. When the time difference exceeds the benchmark for the first time, data collection for the current time window is stopped. All initial traffic records included in the current time window are then encapsulated into an independent traffic behavior segment. This segment is a continuous traffic data unit divided according to the same quintuple and dynamic time rules. Simultaneously, the initial traffic record that triggered the time difference exceeding the benchmark is taken as the new starting point, initiating the data collection process for the next time window. Finally, the process of sorting, calculating time differences, comparing thresholds, encapsulating segments, and updating the new window starting point is repeated until all initial traffic records in the same processing queue are assigned to their corresponding traffic behavior segments, ultimately resulting in multiple traffic behavior segments.
[0025] In some embodiments, hashing each initial traffic record according to its five-tuple information to group initial traffic records with the same five-tuple into the same processing queue is achieved through the following steps: Hash the 5-tuple information in each initial traffic record to obtain a fixed-length 5-tuple hash value; Using the quintuple hash value as the input key, the initial traffic record is mapped to a hash ring consisting of multiple processing units. The target processing unit is determined by taking the modulo of the fixed-length quintuple hash value with the total number of processing units. Within each processing unit, the 5-tuple hash value carried by the initial traffic record received by the target processing unit is used as the queue index key, and initial traffic records with the same queue index key are grouped into the same processing queue.
[0026] In specific implementation, firstly, the 5-tuple information of each initial traffic record is hashed using the MD5 hash function to obtain a 128-bit fixed-length 5-tuple hash value. The MD5 hash function can map 5-tuple strings of arbitrary length to a fixed-length hash value to ensure even traffic distribution. Using this 128-bit fixed-length 5-tuple hash value as the mapping key, the initial traffic record is mapped to a hash ring composed of multiple parallel processing units. The hash ring is a ring-shaped data structure used to achieve distributed traffic load balancing. Then, the 128-bit 5-tuple hash value is converted into a decimal unsigned integer. A decimal integer is moduloed by the total number of online processing units. The result of the modulo operation is an integer greater than or equal to 0 and less than the total number of processing units. This integer directly corresponds to the number of the processing unit on the hash ring, thereby determining the target processing unit to which the current initial traffic record belongs. Finally, within the target processing unit, the 5-tuple hash value is used as the queue index key, and initial traffic records with the same index key are grouped into the same processing queue. When the number of traffic data packets of a certain 5-tuple exceeds the preset processing queue capacity threshold, the oldest record is discarded according to the timestamp or an alarm bypass is triggered to avoid queue overflow.
[0027] It should be noted that the processing queue in this application refers to an ordered data cache structure used to store traffic data with the same five-tuple. By determining the processing queue, the initial traffic records with the same five-tuple can be centrally collected and temporarily stored in an orderly manner, thereby ensuring that all traffic data packets of the same network service flow are not scattered into different processing flows, avoiding mutual interference between different traffic behavior segments, and serving as a key intermediate storage and traffic isolation structure for the accuracy and logical coherence of the entire real-time traffic monitoring and analysis method.
[0028] It should also be noted that the "dynamic time segmentation" in this application is only used to divide the continuous data packet sequence within the same quintuple and generate local behavior segments. The "same time window" mentioned in the subsequent step S103 refers to a globally unified absolute time window of fixed length. Each network node needs to summarize and report all state transition codes generated within the same absolute time window. The segment boundary of the dynamic time segmentation is independent of the boundary of the global time window. The two are associated by aligning the absolute time window to which the timestamp of each traffic data packet belongs.
[0029] In step S102, for each traffic behavior segment, the packet length change direction and arrival interval change direction between adjacent traffic data packets are calculated based on the packet length sequence and arrival interval sequence corresponding to the traffic behavior segment, and a state transition code characterizing the micro-behavioral pattern within the traffic behavior segment is constructed using the packet length change direction and the arrival interval change direction.
[0030] In some embodiments, the following steps are used to calculate the direction of packet length change and the direction of arrival interval change between adjacent traffic data packets based on the packet length sequence and arrival interval sequence corresponding to the traffic behavior segment: Obtain the packet length sequence and arrival interval sequence arranged in the order of arrival of traffic data packets from the traffic behavior segment; For the packet length sequence, the difference between the second packet length and the first packet length in two adjacent packets is calculated in turn, and the direction of the packet length is identified based on the difference, thereby converting the packet length sequence into a packet length change direction sequence; For the arrival interval sequence, the difference between the next arrival interval and the previous arrival interval is calculated sequentially between two adjacent arrival intervals. The direction of the interval is identified based on the difference, thereby converting the arrival interval sequence into an arrival interval change direction sequence.
[0031] In specific implementation, firstly, from the traffic behavior segments obtained after dynamic time segmentation, extract the packet length sequence and arrival interval sequence arranged according to the actual arrival order of data packets at network nodes. The packet length sequence is an ordered set of values composed of the length values of all traffic data packets within the traffic behavior segment in chronological order, and the arrival interval sequence is an ordered set of values composed of the time interval values of the arrival times of adjacent traffic data packets within the traffic behavior segment in chronological order. Then, for the packet length sequence, a calculation method of adjacent value subtraction is used to calculate the difference by subtracting the previous packet length value from each subsequent packet length value. Then, the direction is identified based on the positive or negative attribute of the difference, converting the continuous packet length value sequence into a packet length change direction sequence containing only directional features. The packet length change direction sequence is... The sequence of packet length change directions at different time points is a characteristic sequence representing the successive increase or decrease trend of the length of traffic data packets within a traffic behavior segment. For example, the length of consecutive data packets is subtracted from the length of the preceding term, and the result is marked as increasing or decreasing if it is positive or negative, and as unchanged if it is zero, thus completing the conversion from a numerical sequence to a directional sequence. Finally, the arrival interval sequence is processed using the same adjacent bit-by-bit subtraction operation and direction identification rules as the packet length sequence. The arrival interval value is subtracted from the preceding arrival interval value by each subsequent arrival interval value to obtain the difference. The direction is identified based on the sign of the difference, thus converting the arrival interval numerical sequence into an arrival interval change direction sequence that only contains the trend of time interval change. The arrival interval change direction sequence is a set of arrival interval change directions at different time points.
[0032] It should be noted that the arrival interval change direction sequence in this application is a feature sequence that characterizes the successive tightening and loosening trend of the arrival time interval of traffic data packets within a traffic behavior segment. The role of determining the arrival interval change direction sequence is to transform the micro-fluctuation pattern of the arrival time interval of adjacent data packets within a traffic behavior segment into a quantifiable and comparable symbolic feature, thereby providing a directional component of the temporal dimension for the subsequent construction of state transition codes. When network traffic is affected by abnormal events, the arrival rhythm of data packets will be distorted. For example, the pulsed traffic of a distributed denial-of-service attack will cause the adjacent interval to shrink sharply, while slow attacks or covert channels may show periodic interval stretching. These temporal distortions are difficult to capture through simple mean or variance statistics.
[0033] In some embodiments, reference Figure 3 As shown, this figure is an exemplary flowchart of determining state transition codes according to some embodiments of this application. In this embodiment, the state transition codes characterizing the micro-behavioral patterns within the traffic behavior segment can be constructed by the following steps using the packet length change direction and the arrival interval change direction: In step S1021, the packet length change direction sequence and the arrival interval change direction sequence are obtained; In step S1022, packet length change direction identifier and arrival interval change direction identifier at the same sequence position are extracted sequentially from the packet length change direction sequence and the arrival interval change direction sequence. The packet length change direction identifier and the arrival interval change direction identifier are combined into a tuple, and the tuple is used as the micro-behavioral state symbol at the sequence position. In step S1023, the micro-behavioral state symbols obtained from all sequence positions within the flow behavior segment are concatenated in chronological order of the original sequence to form a state symbol sequence, and the state symbol sequence is used as the state transition code representing the micro-behavioral pattern within the flow behavior segment.
[0034] In specific implementation, firstly, the packet length change direction sequence and the arrival interval change direction sequence are obtained. After obtaining these sequences, index alignment verification is performed on both sequences. Since both sequences are derived from the same set of adjacent traffic data packets, when a traffic behavior segment contains n traffic data packets, adjacent difference operations will generate n-1 change direction identifiers for each. The lengths of the two sequences are consistent, so no zero padding or truncation is required. Subsequently, the feature encoding module traverses the sequence index positions, starting from index zero to the end of the sequence, and sequentially reads the direction identifier character at the current position from the packet length change direction sequence (i.e., a value of "+" represents a positive packet length change, "-" represents a negative packet length change, and "=" represents no packet length change). At the same time, it reads the direction identifier character at the same index position from the arrival interval change direction sequence. The interval is incremented by two characters ("+" for positive interval change, "-" for negative interval change, and "=" for no interval change). The two characters are concatenated into a tuple with the packet length first and the interval second. This tuple is then used as the micro-behavioral state symbol for that sequence position. The micro-behavioral state symbol is an atomic feature marker representing the joint change in state of a single adjacent data packet pair across both packet length and interval dimensions. Finally, the micro-behavioral state symbols generated at each index position are sequentially written into a character array. Adjacent symbols within the character array are separated by delimiters (e.g., commas or semicolons). The entire character array is output as a string, which is the state symbol sequence. This state symbol sequence is then assigned as a state transition code to the current traffic behavior segment.
[0035] refer to Figure 4This figure is a schematic diagram of the generation state symbol sequence shown in some embodiments of this application. The figure consists of three interconnected sub-figures arranged vertically, with the horizontal axis representing the sequence position (value range 0 to 5). It is used to fully present the hierarchical representation process of the micro-behavioral characteristics of network traffic: The first sub-figure is the packet length change direction sequence. The vertical axis uses three levels, "+", "=", and "-", to represent the direction of packet length change. Sequence positions 0 and 5 correspond to the "+" (packet length increases) state, sequence positions 1 to 4 correspond to the "-" (packet length decreases) state, and there is no "=" (packet length remains unchanged) state throughout, clearly demonstrating the discretized temporal change law of packet length characteristics; The second sub-figure is the arrival interval change direction sequence... The vertical axis also uses "+", "=", and "-" to represent the direction of change in the arrival interval of data packets. Sequence positions 0, 3, and 4 correspond to the "+" state (interval increases), sequence positions 1, 2, and 5 correspond to the "-" state (interval decreases), and there is no "=" state (interval remains unchanged) throughout, which fully presents the temporal evolution of the traffic time dimension characteristics. The third sub-figure is the joint change trajectory of micro-behavioral state symbols. With the direction characteristics of packet length and arrival interval as the joint dimension, the single-dimensional discrete features of the first two sub-figures are fused and mapped through the numerical joint state sequence, which intuitively presents the continuous temporal transition trajectory of the micro-behavioral state of network traffic, realizing the complete representation process from single-dimensional features to joint behavioral states.
[0036] It should be noted that, in this application, the state transition code refers to the characteristic codeword that completely records the packet-by-packet joint behavior change trajectory within a traffic behavior segment in symbolic form. Since the packet length sequence and arrival interval sequence in the original traffic behavior segment are high-dimensional numerical time series, directly storing and comparing them would consume a lot of computational and spatial resources. Moreover, the small fluctuations in the values are difficult to directly reflect the structural changes in the micro-behavioral patterns of traffic. After jointly encoding the packet length change direction and the arrival interval change direction into a state transition code, the continuous numerical sequence is compressed into a discrete symbol sequence. The joint change trend of each adjacent data packet in the two dimensions of packet length and interval is abstracted into a fixed two-dimensional character unit, thereby transforming the traffic behavior segment into a compact and comparable behavior pattern signature, providing a calculable quantitative basis for the determination and location of abnormal traffic events.
[0037] In step S103, the state transition codes generated by each network node within the same time window are aggregated to the central analysis node to construct the global network analysis space, so as to obtain the global observation matrix.
[0038] It should be noted that, in this application, the central analysis node refers to a distributed processing server cluster that is logically centralized and undertakes global data aggregation, feature alignment, and anomaly judgment calculation in large-scale network traffic real-time monitoring and analysis. The central analysis node receives state transition codes carrying node identifiers and time window identifiers reported by each probe server through probe agents deployed in various locations of the network via in-band or out-of-band management channels. After aligning the discrete state transition codes from different network regions according to the time window identifiers, it organizes them into a global observation matrix with nodes as rows and state transition codes as columns. Furthermore, it loads historical normal traffic baseline data to perform deviation calculation and anomaly judgment, thus acting as the convergence and decision-making hub from local feature acquisition to full network situational awareness in the data processing link of the entire monitoring system.
[0039] In some embodiments, the state transition codes generated by each network node within the same time window are aggregated to the central analysis node to construct the global network analysis space, thereby obtaining the global observation matrix. This is achieved through the following steps: The central analysis node receives state transition codes reported by each network node, which carry node identifiers and time window identifiers. The received state transition codes are aggregated according to the time window identifier to obtain the set of state transition codes of each node under the current time window to be analyzed, with the node identifier as the grouping key; A global observation matrix is constructed using the current time window to be analyzed as the matrix index, the network node identifiers participating in the reporting as the row index, and the enumeration results of all dissimilar state transition codes extracted from the state transition code sets of each node as the column index.
[0040] In practice, the process begins with the central analysis node using a network data reporting and receiving communication method to uniformly receive state transition codes uploaded in real time by each network node, each code accompanied by a node identifier and a time window identifier. The node identifier is a unique identifier for different network nodes, and the time window identifier is a unique identifier for different analysis periods. Then, all received state transition codes are categorized and aggregated based on the time window identifier, filtering out all state transition codes belonging to the current analysis time window. These codes are then further aggregated using the node identifier as the grouping key, forming a set of state transition codes corresponding to each network node within the current analysis time window. This set represents the total set of all micro-behavioral pattern codes of a single network node within a specified time window. Finally, a matrix-structured construction method is used, with the current analysis time window as the overall matrix index, all reported network node identifiers as the matrix row index, and the enumeration results of all non-repeating state transition codes extracted from the state transition code sets of each node as the matrix column index, thus completing the construction of the global observation matrix.
[0041] It should be noted that the global observation matrix in this application is a two-dimensional data structure used to carry the traffic behavior characteristics of all nodes in the network in the current time window. The values of its matrix elements are used to characterize the frequency of occurrence of the corresponding state transition codes of a specific network node in the time window. The role of this matrix is to uniformly map the discrete state transition codes generated by nodes distributed in different locations of the network in the same time window to the same joint feature space spanned by row vectors and column vectors, so that the traffic micro-behavior patterns of different nodes can be directly compared and numerically calculated in the same column dimension.
[0042] In step S104, node state deviation analysis is performed on each network node based on the global observation matrix of the current time window and the baseline feature matrix composed of historical normal flow state transition codes, thereby obtaining the state deviation vector corresponding to each network node.
[0043] In some embodiments, the node state deviation analysis of each network node is performed based on the global observation matrix of the current time window and the baseline feature matrix composed of historical normal traffic state transition codes, and the state deviation vector corresponding to each network node is obtained by the following steps: A baseline feature matrix is determined by state transition codes of historical normal traffic. The baseline feature matrix uses network node identifiers as row indices and state transition codes enumerated from historical normal traffic as column indices. The matrix elements record the historical expected frequency of each state transition code corresponding to each network node. Align the global observation matrix of the current time window with the baseline feature matrix in rows and columns; For each row vector in the global observation matrix after row and column alignment, the frequency of occurrence of the current time window state transition code represented by each column in the row vector is extracted, and the frequency of occurrence of the current time window state transition code is compared with the historical expected frequency of the corresponding column in the corresponding row vector in the baseline feature matrix one by one to calculate the frequency deviation of the network node on each state transition code. The state deviation vector of a network node is composed of the frequency deviations of all state transition codes of that network node.
[0044] The baseline feature matrix, composed of historical normal traffic state transition codes, is determined using the following steps: Collect the state transition codes carrying node identifiers reported by each network node during normal network operation periods, and use them as historical normal state transition code samples. After grouping the historical normal state transition code samples by node identifier, the frequency of codewords is counted for the historical normal state transition code samples corresponding to each node identifier to obtain the frequency of occurrence of each state transition code under each node identifier. The frequency of occurrence of each state transition code under each node identifier is normalized by the number of time windows corresponding to the duration of the abnormal operation period, so as to obtain the historical expected frequency of each state transition code under each node identifier. A baseline feature matrix is constructed using the network node identifier as the row index, the enumeration result of all dissimilar state transition codes appearing in the historical normal state transition code sample as the column index, and the historical expected frequency as the matrix element.
[0045] In specific implementation, firstly, state transition codes carrying corresponding node identifiers reported by each network node during normal operation are collected. These are used as historical normal state transition code samples to construct baseline features. These historical normal state transition code samples are a set of standard codewords characterizing the micro-traffic behavior patterns of network nodes under stable and normal operation. Next, a keyword-based grouping method is used to classify all historical normal state transition code samples according to node identifiers, ensuring that state transition codes corresponding to the same network node are grouped together. Then, a frequency statistics algorithm is used to count the various state transition codes within each node identifier group, obtaining the frequency of occurrence of different state transition codes under each network node. The frequency of occurrence is a statistical value reflecting the number of times various micro-behavioral patterns of a single network node occur under normal conditions. Following this... Then, a time window normalization method was adopted. The total number of time windows corresponding to the total duration of the abnormal operation period was used as the normalization denominator. The occurrence frequency of each state transition code under each node identifier was divided to convert the absolute count value into a relative proportion value, so as to obtain the historical expected frequency of each state transition code under each network node. The historical expected frequency is a standardized value that characterizes the average probability of occurrence of each micro-behavioral pattern of the network node under normal conditions. Finally, a matrix construction method was adopted. The network node identifier was used as the matrix row index, the enumeration result of all non-repeating state transition codes in the historical normal state transition code sample was used as the matrix column index, and the historical expected frequency at the corresponding position was used as the matrix element to complete the construction of the baseline feature matrix. The baseline feature matrix is the global normal behavior benchmark matrix used for subsequent real-time traffic state deviation comparison.
[0046] It should be noted that the baseline feature matrix in this application refers to a reference feature matrix that uses network node identifiers as matrix row indices, enumeration results of all non-repeating state transition codes in the historical normal state transition code samples as matrix column indices, and the historical expected frequency of the corresponding position as matrix elements. Since there are inherent differences in the business roles and traffic composition of different nodes in a large-scale network, if a single global threshold or static rule is used for anomaly detection, it will be impossible to distinguish the inherent behavioral characteristics of nodes from the real abnormal deviations. It is necessary to establish an independent and quantified normal behavior reference benchmark for each node. By collecting state transition code samples during abnormal operation periods and statistically analyzing the historical expected frequency for each codeword of each node, the constructed baseline feature matrix uses nodes as rows and state transition codes as columns. It characterizes the probability of occurrence of each micro-behavioral pattern of each node in the normal state as a set of numerical vectors, thereby establishing a normal behavior model for each node that conforms to its own business characteristics.
[0047] In practice, firstly, the global observation matrix and the pre-trained baseline feature matrix for the current time window are obtained. Consistency processing is performed on the row and column index sets of the two matrices. The column index of the baseline feature matrix is used as the reference column space, and the intersection of the row indices of the baseline feature matrix and the global observation matrix is used as the common row space. The global observation matrix is then expanded in columns and filtered in rows. For state transition codes not appearing in the current window, zeros are padded at the corresponding positions of the nodes. Nodes not reported in the current window are removed from this round of analysis, ensuring that the aligned global observation matrix and the baseline feature matrix have identical row and column vector dimensions. Next, for each row vector of the globally observed matrix after row and column alignment, the actual occurrence frequency of the corresponding state transition code within the current time window represented by the vector is extracted column by column. At the same time, the historical expected frequency at the same row and column position is located in the baseline feature matrix. The difference between the two is calculated by point-by-point numerical subtraction to obtain the frequency deviation of the network node on the corresponding state transition code. The frequency deviation is a single-dimensional quantitative value that reflects the degree of difference between real-time traffic behavior and historical normal behavior. Finally, the frequency deviations obtained by the network node on all state transition codes are combined in a fixed column order to form a state deviation vector that represents the overall abnormal deviation degree of the network node.
[0048] It should be noted that in this application, the state deviation vector refers to the deviation of a network node from its normal behavior baseline in the corresponding micro-behavioral pattern. Determining the state deviation vector involves organizing the isolated frequency deviations of the network node in multiple state transition code dimensions into a joint deviation representation that can be uniformly computed, so that the degree of abnormality of the node no longer depends on the threshold trigger of a single codeword, but is measured by the magnitude and direction of the vector as a whole.
[0049] In step S105, the abnormal traffic events within the current time window are located and constrained based on the projection components of the state deviation vectors corresponding to each network node in the global network analysis space.
[0050] In some embodiments, the location constraint of abnormal traffic events within the current time window based on the projection components of the state deviation vector corresponding to each network node in the global network analysis space is achieved by the following steps: The global state deviation vector is obtained by summing the state deviation vectors of all network nodes. For each network node, the inner product of the network node's state deviation vector and the global state deviation vector is calculated to obtain the magnitude of the projection component of that network node. The network node with the largest projection component is marked as the abnormal traffic initiation point. In the global observation matrix, network nodes that have at least one state transition code with the abnormal traffic initiation point are extracted to form an associated node set. The projected component size of each network node in the associated node set is compared with a preset propagation threshold. Network nodes whose projected component size exceeds the preset propagation threshold are selected, and the selected network nodes are arranged in descending order of projected component size. The sequence of network nodes arranged in descending order is determined as the propagation location path of abnormal traffic.
[0051] In specific implementation, firstly, using the element-wise addition of vectors in linear algebra, the state deviation vectors corresponding to all network nodes in the global network analysis space are summed in the same dimension to obtain the global state deviation vector, which represents the overall abnormality level of the current time window. The global state deviation vector is a comprehensive vector reflecting the abnormal distribution characteristics of global network traffic. Then, a vector inner product operation is performed on each network node individually, multiplying the node's state deviation vector by the global state deviation vector in the corresponding dimension and summing the results to obtain the magnitude of the node's projection component in the global network analysis space. The magnitude of the projection component is a quantitative value that measures the correlation between the degree of deviation of a single network node from the normal state and the global abnormal trend. Next... The node with the largest projected component among all network nodes is marked as the abnormal traffic initiation point. The abnormal traffic initiation point is the network node that first generates abnormal traffic behavior within the current time window. At the same time, other network nodes with at least one identical state transition code to the abnormal traffic initiation point are selected from the global observation matrix. These nodes are grouped into a set of associated nodes, which is a group of potential propagation nodes that are associated with the abnormal traffic behavior characteristics. Finally, the projected component of each node in the set of associated nodes is compared with a preset propagation threshold. Nodes with projected components greater than the propagation threshold are retained. These nodes are then sorted in descending order according to the projected component from largest to smallest to form the final abnormal traffic propagation and location path.
[0052] It should be noted that, in this application, the propagation location path refers to the propagation path of an abnormal traffic event, which is an ordered sequence of nodes that reflects the order of node transmission and the scope of influence of the abnormal traffic spreading from the initiating point to the entire network.
[0053] Furthermore, in another aspect of this application, in some embodiments, this application provides a real-time traffic monitoring and analysis system for large-scale networks, referencing... Figure 5 The figure is a schematic diagram of the structure of a real-time traffic monitoring and analysis system for large-scale networks according to some embodiments of this application. The real-time traffic monitoring and analysis system for large-scale networks includes: an acquisition module 201, a processing module 202, and an execution module 203, which are described below: The acquisition module 201 in this application is mainly used to perform dynamic time segmentation based on the five-tuple information and timestamp of each traffic data packet after acquiring the traffic data packets of each network node, so as to obtain multiple traffic behavior segments. Processing module 202, in this application, is mainly used to calculate the packet length change direction and arrival interval change direction between adjacent traffic data packets based on the packet length sequence and arrival interval sequence corresponding to each traffic behavior segment, and to construct a state transition code that characterizes the micro-behavioral pattern within the traffic behavior segment through the packet length change direction and the arrival interval change direction. The processing module 202 is also used to aggregate the state transition codes generated by each network node within the same time window to the central analysis node to construct the global network analysis space, so as to obtain the global observation matrix. In addition, the processing module 202 is also used to perform node state deviation analysis on each network node based on the global observation matrix of the current time window and the baseline feature matrix composed of historical normal traffic state transition codes, so as to obtain the state deviation vector corresponding to each network node. The execution module 203 in this application is mainly used to locate and constrain abnormal traffic events in the current time window based on the projection components of the state deviation vectors corresponding to each network node in the global network analysis space.
[0054] In addition, this application also provides a computer device, the computer device including a memory and a processor, the memory storing code, the processor being configured to acquire the code and execute the above-described method for real-time traffic monitoring and analysis for large-scale networks.
[0055] In some embodiments, reference Figure 6The figure is a schematic diagram of the structure of a computer device implementing a real-time traffic monitoring and analysis method for large-scale networks, according to some embodiments of this application. The real-time traffic monitoring and analysis method for large-scale networks in the above embodiments can... Figure 6 The computer device shown is used to implement this, and the computer device includes at least one processor 301, a communication bus 302, a memory 303, and at least one communication interface 304.
[0056] The processor 301 can be a general-purpose central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more devices used to control the execution of the real-time traffic monitoring and analysis method for large-scale networks described in this application.
[0057] The communication bus 302 can be used to transmit information between the aforementioned components.
[0058] The memory 303 may be a read-only memory (ROM) or other type of static storage device capable of storing static information and instructions, random access memory (RAM) or other type of dynamic storage device capable of storing information and instructions, or electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital universal optical discs, Blu-ray discs, etc.), magnetic disks or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but not limited thereto. The memory 303 may exist independently and be connected to the processor 301 via the communication bus 302. The memory 303 may also be integrated with the processor 301.
[0059] The memory 303 stores program code for executing the scheme of this application, and its execution is controlled by the processor 301. The processor 301 executes the program code stored in the memory 303. The program code may include one or more software modules. In the above embodiments, the determination of the real-time traffic monitoring and analysis method for large-scale networks can be implemented by the processor 301 and one or more software modules in the program code in the memory 303.
[0060] Communication interface 304 uses any transceiver-like device for communicating with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc.
[0061] In a specific implementation, as one example, a computer device may include multiple processors, each of which may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. Here, a processor may refer to one or more devices, circuits, and / or processing cores used to process data (e.g., computer program instructions).
[0062] The aforementioned computer device can be a general-purpose computer device or a special-purpose computer device. In specific implementations, the computer device can be a desktop computer, a portable computer, a network server, a handheld digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, or an embedded device. This application does not limit the type of computer device.
[0063] In addition, this application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method for real-time traffic monitoring and analysis for large-scale networks.
[0064] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.
[0065] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.
Claims
1. A method for real-time traffic monitoring and analysis of large-scale networks, characterized in that, Includes the following steps: After acquiring the traffic data packets of each network node, dynamic time segmentation is performed based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments. For each traffic behavior segment, the packet length change direction and arrival interval change direction between adjacent traffic data packets are calculated based on the packet length sequence and arrival interval sequence corresponding to the traffic behavior segment. A state transition code characterizing the micro-behavioral pattern within the traffic behavior segment is constructed using the packet length change direction and the arrival interval change direction. The state transition codes generated by each network node within the same time window are aggregated and sent to the central analysis node to construct the global network analysis space, so as to obtain the global observation matrix. Based on the global observation matrix of the current time window and the baseline feature matrix composed of historical normal traffic state transition codes, node state deviation analysis is performed on each network node to obtain the state deviation vector corresponding to each network node. The abnormal traffic events within the current time window are located and constrained based on the projection components of the state deviation vectors of each network node in the global network analysis space.
2. The method as described in claim 1, characterized in that, Dynamic time segmentation is performed based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments, specifically including: Extract the 5-tuple information and timestamp of each traffic data packet to generate the initial traffic record; Each initial traffic record is hashed and split according to the 5-tuple information, so that initial traffic records with the same 5-tuple are grouped into the same processing queue. After sorting the initial traffic records in the same processing queue according to their timestamps, the timestamp of the initial traffic record at the head of the queue is taken as the starting time. The difference between each subsequent timestamp and the starting time is compared with a preset time window benchmark value. When the difference exceeds the time window benchmark value for the first time, all the initial traffic records that have been included in the current time window are encapsulated into a traffic behavior fragment, and the initial traffic record that triggered the exceedance is taken as the new starting time to start the aggregation of the next time window. Continue processing all initial traffic records in the same processing queue until all initial traffic records have been allocated to obtain multiple traffic behavior fragments.
3. The method as described in claim 2, characterized in that, The initial traffic records are hashed and distributed according to their 5-tuple information, so that initial traffic records with the same 5-tuple are grouped into the same processing queue. This specifically includes: Hash the 5-tuple information in each initial traffic record to obtain a fixed-length 5-tuple hash value; Using the quintuple hash value as the input key, the initial traffic record is mapped to a hash ring consisting of multiple processing units. The target processing unit is determined by taking the modulo of the fixed-length quintuple hash value with the total number of processing units. Within each processing unit, the 5-tuple hash value carried by the initial traffic record received by the target processing unit is used as the queue index key, and initial traffic records with the same queue index key are grouped into the same processing queue.
4. The method as described in claim 1, characterized in that, The calculation of the direction of packet length change and direction of arrival interval change between adjacent traffic data packets based on the packet length sequence and arrival interval sequence corresponding to traffic behavior segments specifically includes: Obtain the packet length sequence and arrival interval sequence arranged in the order of arrival of traffic data packets from the traffic behavior segment; For the packet length sequence, the difference between the second packet length and the first packet length in two adjacent packets is calculated in turn, and the direction of the packet length is identified based on the difference, thereby converting the packet length sequence into a packet length change direction sequence; For the arrival interval sequence, the difference between the next arrival interval and the previous arrival interval is calculated sequentially between two adjacent arrival intervals. The direction of the interval is identified based on the difference, thereby converting the arrival interval sequence into an arrival interval change direction sequence.
5. The method as described in claim 1, characterized in that, The state transition code, which constructs a characterizing micro-behavioral pattern within the traffic behavior segment by using the packet length change direction and the arrival interval change direction, specifically includes: Obtain the packet length change direction sequence and the arrival interval change direction sequence; From the packet length change direction sequence and the arrival interval change direction sequence, the packet length change direction identifier and the arrival interval change direction identifier at the same sequence position are extracted sequentially. The packet length change direction identifier and the arrival interval change direction identifier are combined into a tuple, and the tuple is used as the micro-behavioral state symbol at the sequence position. The micro-behavioral state symbols obtained from all sequence positions within the traffic behavior segment are concatenated in chronological order of the original sequence to form a state symbol sequence, and the state symbol sequence is used as the state transition code representing the micro-behavioral pattern within the traffic behavior segment.
6. The method as described in claim 1, characterized in that, The state transition codes generated by each network node within the same time window are aggregated and sent to the central analysis node to construct the global network analysis space, resulting in the global observation matrix, which specifically includes: The central analysis node receives state transition codes reported by each network node, which carry node identifiers and time window identifiers. The received state transition codes are aggregated according to the time window identifier to obtain the set of state transition codes of each node under the current time window to be analyzed, with the node identifier as the grouping key; A global observation matrix is constructed using the current time window to be analyzed as the matrix index, the network node identifiers participating in the reporting as the row index, and the enumeration results of all dissimilar state transition codes extracted from the state transition code sets of each node as the column index.
7. The method as described in claim 1, characterized in that, Traffic data packets of each network node are obtained by bypassing traffic probes.
8. A real-time traffic monitoring and analysis system for large-scale networks, used to execute the real-time traffic monitoring and analysis method for large-scale networks as described in any one of claims 1 to 7, characterized in that, The system includes: The acquisition module is used to dynamically time-slice the traffic data packets of each network node based on the five-tuple information and timestamp of each traffic data packet to obtain multiple traffic behavior segments. The processing module is used to calculate the packet length change direction and arrival interval change direction between adjacent traffic data packets based on the packet length sequence and arrival interval sequence corresponding to each traffic behavior segment, and to construct a state transition code that characterizes the micro-behavioral pattern within the traffic behavior segment through the packet length change direction and the arrival interval change direction. The processing module is also used to aggregate the state transition codes generated by each network node within the same time window to the central analysis node to construct the global network analysis space, so as to obtain the global observation matrix. The processing module is also used to perform node state deviation analysis on each network node based on the global observation matrix of the current time window and the baseline feature matrix composed of historical normal traffic state transition codes, so as to obtain the state deviation vector corresponding to each network node. The execution module is used to locate and constrain abnormal traffic events within the current time window based on the projection components of the state deviation vectors of each network node in the global network analysis space.
9. A computer device, characterized in that, The computer device includes a memory and a processor, the memory storing code, and the processor being configured to acquire the code and execute the real-time traffic monitoring and analysis method for large-scale networks as described in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the real-time traffic monitoring and analysis method for large-scale networks as described in any one of claims 1 to 7.