Intelligent research and judgment system and method for multi-source data of a target range

By using a pluggable microkernel architecture and distributed stream processing technology, combined with multidimensional aggregation and causal correlation matching algorithms, the problem of data dispersion and identification of complex attacks in intelligent analysis of multi-source data in the target range is solved. This enables efficient and real-time intelligent analysis and decision support, improving the efficiency of actual combat response.

CN122247757APending Publication Date: 2026-06-19SICHUAN YILAN SITUATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SICHUAN YILAN SITUATION TECH CO LTD
Filing Date
2026-05-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for intelligent analysis of multi-source data in target ranges suffer from problems such as scattered data sources, inconsistent protocol interfaces, lack of structured design in analysis processes, reliance on static rules for event correlation, lack of quantitative scoring mechanisms, and insufficient visualization. These issues lead to fragmented information, difficulty in identifying complex attacks, and low efficiency in actual combat response.

Method used

It adopts a unified access gateway with a pluggable microkernel architecture, a distributed stream processing architecture, a multi-dimensional aggregation and causal association matching algorithm, a multi-factor weighted scoring decision engine, and a front-end and back-end separated visualization display to achieve efficient aggregation, standardized processing, intelligent correlation analysis, and dynamic decision support of multi-source data.

Benefits of technology

It achieves lossless aggregation and real-time standardized processing of multi-source data, improves the ability to identify complex attack chains, enhances scenario adaptability and decision support capabilities, and supports real-time situation visualization and post-event review analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247757A_ABST
    Figure CN122247757A_ABST
Patent Text Reader

Abstract

This invention discloses an intelligent analysis system and method for multi-source data in a target range, relating to the field of data processing technology. The system includes: collecting multi-source heterogeneous data through a pluggable microkernel gateway and uniformly encapsulating it; cleaning and standardizing the data using a distributed stream processing architecture to output a unified event model stream; aggregating events and establishing causal relationships based on time sliding windows and time-series rules to construct an analysis event chain; calculating a multi-factor weighted score for the analysis event chain according to a pre-set multi-dimensional rule template and dynamic strategy set to obtain a comprehensive risk value, and automatically determining the risk level based on a preset threshold range; acquiring the analysis result data through a front-end / back-end separation architecture and rendering statistical charts, dynamic topology views, and attack timelines, while automatically generating a drill report. This invention achieves efficient data aggregation and real-time standardization, improving the ability to analyze complex attacks, scenario adaptability, and the visualization level of the entire process situation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, specifically to an intelligent analysis system and method for multi-source data from a target range. Background Technology

[0002] With the widespread application of cyber ranges in attack and defense drills, security assessments, and realistic training, the types of data generated during their operation are becoming increasingly diverse, covering network traffic, system logs, alarm information, vulnerability detection results, and user operation records, exhibiting characteristics of high concurrency, strong heterogeneity, and large scale. This multi-source data is the core basis for judging attack behavior within the range, assessing defense capabilities, and reviewing drills. Its efficient fusion and intelligent analysis directly determine the range system's situational awareness level and decision support capabilities.

[0003] Among these technologies, the intelligent analysis of multi-source data from the test range aims to achieve an end-to-end closed loop from raw event collection to security posture output. Through unified access, standardized processing, cross-dimensional correlation, and risk quantification of heterogeneous data, it enables accurate identification and dynamic assessment of various security events, including basic attacks, complex attacks, threshold attacks, and violations. The core of this technology lies in building an analysis platform with high compatibility, strong real-time performance, and deep reasoning capabilities to support the visualization, traceability, and measurability of the entire offensive and defensive process.

[0004] Existing technologies have significant shortcomings in addressing these needs: First, data sources are scattered and protocol interfaces are inconsistent, making it difficult to seamlessly aggregate multi-channel data such as Syslog, RESTful APIs, and message queues, resulting in fragmented information. Second, the analysis process lacks a structured design, with key steps such as cleaning, alignment, and time calibration operating independently, failing to form a standardized event model. Third, event correlation relies on static rules, making it difficult to effectively identify multi-stage attack chains, especially when intermediate links are missing, hindering logical completion and intelligent deduction. Finally, the analysis results lack quantitative scoring mechanisms and scenario-adaptive strategies, employing a "one-size-fits-all" static rule that cannot adapt to the fundamental differences in judgment logic between red team / blue team exercises (focusing on discovering covert attacks) and CTF competitions (focusing on scoring and flag submission), leading to the easy omission of low-frequency advanced threats in practical modes or the generation of numerous invalid alerts in competition modes. Furthermore, visualization and report generation are mostly static displays, making it difficult to support real-time scheduling and post-event review. These problems severely restrict the intelligent analysis capabilities and combat response efficiency of the range system in complex offensive and defensive scenarios, and urgently require an integrated, engineered, and scalable multi-source data intelligent analysis system to solve them. Summary of the Invention

[0005] The purpose of this invention is to overcome the shortcomings of the prior art and to provide a multi-source data intelligent analysis system and method for target ranges, which can effectively solve the problems in the background art.

[0006] To solve the above problems, the technical solution adopted by the present invention is as follows:

[0007] On the one hand, the present invention provides a target range multi-source data intelligent analysis system, which includes the following components:

[0008] The data access module is built on a pluggable microkernel architecture to create a unified access gateway. It is used to instantiate multiple protocol adapters in parallel to establish independent communication channels with attack machines, defense machines, monitoring nodes and log platforms in the test range. It also performs unified encapsulation and metadata injection on the raw data accessed through different protocols to form internal general data objects.

[0009] The data processing and standardization module, built on a distributed stream processing architecture, is used to clean, parse, align fields, calibrate timestamps, and label the internal general data objects, and output a structured unified event model stream.

[0010] The event aggregation and correlation analysis module is used to perform multi-dimensional attribute aggregation on the unified event model stream based on a time sliding window to form event buckets to be analyzed, and to perform causal correlation matching on the events in the buckets based on a preset time-series correlation rule library to construct a complete analysis event chain including attack task identifiers.

[0011] The analysis and decision engine is used to perform multi-factor weighted scoring calculation on the analysis event chain based on the preset multi-dimensional rule template and dynamic strategy set, to obtain a comprehensive risk value, and to automatically determine the risk level based on a preset threshold range.

[0012] The visualization and reporting module adopts a front-end and back-end separation architecture. It is used to obtain the analysis results data through the RESTful API interface and render and generate statistical charts, dynamic topology views and attack timelines. At the same time, it automatically generates exercise reports based on template engine and placeholder technology.

[0013] Preferably, in the data access module, the parallel instantiation of the protocol adapter specifically includes: for the Syslog protocol, starting a server to listen on UDP / 514 port and using a line-delimiter-based decoder to handle TCP packet fragmentation and reassembly issues to receive unstructured log streams; for the RESTful interface, starting an embedded lightweight web server to expose a standard POST API endpoint and receiving JSON-formatted alarm data after authenticating and verifying the request header; for the Kafka message queue, configuring the gateway as a Kafka consumer group, setting enable.auto.commit=false to disable automatic commit, and using a batch pull mechanism to subscribe to data for a specified Topic.

[0014] Furthermore, in the data access module, the unified encapsulation and metadata injection process for heterogeneous data is as follows: an internal data object containing a metadata header and a raw payload is created for each raw data packet; the metadata header is dynamically injected with the collection timestamp, data source IP, access protocol type, and interface identifier; the raw payload is a Syslog text string, HTTP body, or Kafka binary stream, stored losslessly in Base64 encoded form.

[0015] Furthermore, the specific process executed by the data processing and standardization module includes:

[0016] Step 1: Multi-source heterogeneous data cleaning and parsing. Use regular expression library to parse unstructured logs and apply preset verification rules to discard corrupted data packets that cannot be parsed. Filter dirty data with empty key fields or formats that do not conform to IPv4 standards. At the same time, based on time sliding window, deduplicatize millisecond-level repeated reports of data from the same event source.

[0017] Step 2: Based on the field alignment of the mapping knowledge base, when the data flows through the Map operator, the system queries the built-in heterogeneous field mapping knowledge base according to the device identifier ID in the message header, maps the original heterogeneous field names to standard field names, and converts the numerical protocol code into standard text tags.

[0018] Step 3: Timestamp calibration and clock synchronization. All collected timestamps are uniformly converted to the UTC+0 standard time format, and a transmission delay threshold Δ is set. This threshold is set when the data packet arrives. With the collection time carried When the difference exceeds Δ, it is determined based on the average delay constant of the network link. The time of the event is adjusted using the following formula:

[0019] ,

[0020] At the same time, the Watermark mechanism of the distributed stream processing framework is used to process out-of-order data;

[0021] Step four involves the construction and tagging of a unified event model. This generates a standardized JSON data structure with basic quintuples, spatiotemporal attributes, and business attributes. The context enrichment operator is then invoked to associate the event with a geographic location database based on the IP address, thus tagging the event with its place of origin.

[0022] Furthermore, in the event aggregation and correlation analysis module, the primary aggregation process based on multidimensional tuples is as follows: The length is set to... The time sliding window extracts the {source IP address, destination IP address, destination port} triple from the unified event model stream as the aggregation primary key, and groups discrete alarm events with the same primary key into the same event bucket to be analyzed.

[0023] Preferably, in the event aggregation and correlation analysis module, the causal correlation matching process based on a pre-set rule base is as follows: rules are defined in JSON format, including preconditions, postconditions, and time constraints; the matching algorithm traverses events within the event bucket, and if the temporal logic and attribute constraints defined in the rule are satisfied, it determines that these independent events have a causal relationship and assigns a unified attack task ID; an example of the rule is: precondition A is an event type of " "And in The number of occurrences within a given period exceeds the threshold N, and the inorder condition B is an event type of " "and the time of occurrence" > The subsequent condition C is an event of type " "or" ".

[0024] Furthermore, in the event aggregation and correlation analysis module, the complete process of constructing the event chain is as follows: using the attack task ID as the index and the timestamp as the weight of the directed edge, the associated events are connected in chronological order of occurrence; when a logical blank stage is detected in the event chain, the system automatically inserts a virtual node to issue an alarm, forming a complete attack tracing graph that includes the attack entry point, lateral movement path, and scope of impact.

[0025] Furthermore, in the judgment and decision engine, the data structure definition of the multi-dimensional rule template includes: attack stage dimension weight, set according to scanning detection, attack breakthrough, remote control, and theft and exploitation models; target role dimension coefficient, set according to asset importance; and event base score, which presets a base harm score in the range of 0-100 for each attack type.

[0026] Preferably, in the judgment and decision engine, the configuration of the dynamic strategy set is used to dynamically adjust the judgment logic according to the exercise type; the strategy set configuration table defines: when the exercise type is "red-blue confrontation (real combat mode)", the sensitive mode strategy is enabled, all judgment thresholds are reduced by 20% and the core target weight coefficient is increased; when the exercise type is "CTF competition mode", the scoring mode strategy is enabled, and only the Flag submission event is considered.

[0027] Furthermore, in the aforementioned analysis and decision-making engine, the automatic risk level determination logic is as follows: when the engine receives an event E to be analyzed, it extracts its attack phase S, target role T, and event type Type, and executes a calculation formula to obtain the comprehensive risk value R:

[0028] ,

[0029] Where λ is the exercise type correction coefficient, with a default value of 1; the system compares R with the preset classification threshold range. If 0≤R<30, it is judged as low risk; if 30≤R<70, it is judged as medium risk; if R≥70, it is judged as high risk.

[0030] Furthermore, in the visualization and reporting module, the acquisition and rendering logic of multi-view data includes: statistical chart rendering, where the backend performs an aggregation query from the analysis result database, serializes the attack type distribution and risk level percentage data into JSON format and returns it to the frontend, and the frontend uses a Canvas or SVG engine to map the values ​​into graphic coordinates, colors, and areas; dynamic topology view generation, using a force-directed layout algorithm, defining network asset IPs and attack source IPs as graph nodes, defining attack traffic or relationships as edges, and calculating repulsive and attractive forces based on the strength of relationships between nodes to dynamically generate node coordinates; and attack timeline construction, using timestamps as indexes, arranging discrete events under the same attack task in ascending chronological order, and automatically calculating the time difference Δt between adjacent events. When Δt is less than a set threshold, the events are merged and displayed.

[0031] Preferably, in the visualization and reporting module, the automatic report generation process based on placeholder technology is as follows: A pre-set electronic document template containing standardized placeholder variables is invoked; when a report generation command is triggered, the background task traverses the current analysis dataset and calculates the actual business data corresponding to the placeholders; the template engine reads the data stream of the electronic document template, uses regular expression matching to locate the placeholders, and replaces the corresponding variables with actual text, table data, or Base64 format image streams; after the replacement is completed, the system converts the rendered document object into a binary stream output in PDF or Office Open XML format.

[0032] On the other hand, this invention provides an intelligent analysis method for multi-source data from a target range, the specific steps of which are as follows:

[0033] Step S110: Through a unified access gateway based on a pluggable microkernel architecture, multiple protocol adapters are instantiated in parallel to collect multi-source heterogeneous raw data in the test range, and the raw data is uniformly encapsulated and metadata is injected to form an internal general data object.

[0034] Step S120: Based on the distributed stream processing architecture, the internal general data object is cleaned, parsed, field aligned, timestamp calibrated and tagged to output a structured unified event model stream.

[0035] Step S130: Perform multi-dimensional attribute aggregation based on time sliding window on the unified event model stream to form event buckets to be analyzed, and perform causal association matching on the events in the buckets based on the preset time-series association rule base to construct a complete analysis event chain including attack task identifiers;

[0036] Step S140: Based on the preset multi-dimensional rule template and dynamic strategy set, perform multi-factor weighted scoring calculation on the judgment event chain to obtain a comprehensive risk value, and automatically determine the risk level according to the preset threshold range.

[0037] Step S150: Using a front-end and back-end separation architecture, obtain the analysis results data and render and generate statistical charts, dynamic topology views and attack timelines. At the same time, automatically generate an exercise report based on template engine and placeholder technology.

[0038] Compared with the prior art, the beneficial effects of the present invention are:

[0039] (1) Through the unified access gateway and distributed stream processing architecture of the pluggable microkernel, efficient and lossless aggregation and real-time standardized processing of heterogeneous data from multiple protocols such as Syslog, RESTful API, and message queues are realized, solving the problem of information fragmentation caused by scattered data sources and complex formats, and forming a unified, labeled event model stream.

[0040] (2) The multidimensional aggregation based on time sliding window and the causal association matching algorithm based on preset time sequence rules can effectively identify complex attack chains composed of multiple stages and multiple technical means, and construct a complete attack source graph through a logical completion mechanism, thereby improving the intelligent judgment and deep reasoning ability for complex attacks and threshold attacks.

[0041] (3) A judgment and decision engine based on multi-factor weighted scoring and dynamic strategy set was designed, which realizes the accurate mapping of attack event risk from qualitative to quantitative. It can dynamically adjust the judgment threshold and logic according to different exercise modes such as red-blue confrontation and CTF competition, which enhances the scenario adaptability and decision support capability of the judgment results.

[0042] (4) A multi-view visualization interface integrating statistical charts, dynamic topology and timeline was constructed. Combined with the automated report generation technology based on template engine, the visualization, traceability and measurability of the entire offensive and defensive confrontation situation were realized, which effectively supported real-time scheduling and post-event review analysis.

[0043] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, embodiments of the present invention are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0044] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0045] Figure 1 This is a schematic diagram of the overall module architecture and data flow of the multi-source data intelligent analysis system for the target range in this embodiment of the invention;

[0046] Figure 2 This is a flowchart illustrating the logic of attack chain construction based on multidimensional aggregation and timing rules in an embodiment of the present invention. Detailed Implementation

[0047] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are some embodiments of the present invention, but not all embodiments.

[0048] In red-blue team exercises organized in large-scale cybersecurity ranges, attackers employ multi-stage, multi-technical composite attack strategies, while defenders deploy various heterogeneous security monitoring devices, including host security agents, network intrusion detection systems, and log auditing platforms. The raw data generated by these devices varies in format and protocol, including system logs from attacking and defending machines, traffic alerts captured by network monitoring nodes, and application logs aggregated by a centralized log platform. Traditionally, security analysts need to log into different systems separately to view alerts, manually perform time alignment and logical correlation, which is inefficient and prone to missing hidden attack chains. The multi-source data intelligent analysis system for ranges provided by this invention aims to achieve real-time aggregation, standardized processing, intelligent correlation analysis, and visualization of the aforementioned multi-source heterogeneous data, providing precise decision support for exercise command and debriefing.

[0049] See Figure 1 This system includes a data access module, a data processing and standardization module, an event aggregation and correlation analysis module, a decision-making engine, and a visualization and reporting module. These modules work together to form a complete closed loop from data acquisition to intelligent decision output.

[0050] The data access module is built on a pluggable microkernel architecture to construct a unified access gateway. The core function of this gateway is to instantiate multiple protocol adapters in parallel to establish independent, high-concurrency communication channels with various data sources within the test range. For Syslog protocol data from network or security devices, the gateway starts a dedicated server-side listening process bound to UDP / 514 port, while also supporting TCP / 514 port for reliability. For TCP connections, the system uses a line-delimiter-based decoder to dynamically detect packet boundaries, effectively handling potential TCP packet fragmentation and reassembly issues during network transmission, ensuring that every unstructured log text is received completely. For application systems that need to report data via a programming interface, the gateway embeds a lightweight web server, exposing a standard RESTful API endpoint, such as / api / v1 / alert. Any client attempting to send a POST request to this endpoint will have its authentication token in the request header verified first; only JSON-formatted alert data that passes authentication will be received and proceed to the next step. For large-scale data streams that use message queues for asynchronous decoupling, such as real-time alerts from full-traffic probes, the gateway configures itself as a consumer group instance of Apache Kafka. To ensure exact-once semantics in data processing, the gateway sets the `enable.auto.commit` parameter to `false`, disabling the automatic offset commit function for consumers and instead manually committing only after successful business logic processing. Simultaneously, the gateway employs a batch fetch mechanism, pulling multiple messages from a specified topic at once to improve throughput. All raw data accessed through the aforementioned adapter undergoes a unified encapsulation and metadata injection process before entering the system's internal flow. The system dynamically creates an internal generic data object for each raw data packet. This object contains a structured metadata header and a raw payload. The metadata header is injected the instant the data enters the gateway, and its fields include: the system timestamp of the data packet arriving at the gateway, the source IP address that sent the data packet, the access protocol type used, and the specific interface identifier. The raw payload, regardless of its original form as a Syslog text string, an HTTP request body, or a binary message stream in Kafka, is losslessly converted to a Base64 encoded string for storage, ensuring the integrity of the original information and providing a foundation for possible retrospective analysis of the raw data.

[0051] The data processing and standardization module is built on a distributed stream processing architecture and is responsible for real-time cleaning, parsing, and standardization of the internal general data object stream output by the data access module. The execution flow of this module specifically includes four core steps.

[0052] Step 1: Multi-source heterogeneous data cleaning and parsing. The system first decodes and parses the raw payload. For unstructured Syslog text, the system calls a pre-loaded regular expression library for pattern matching to extract key fields such as time, hostname, process name, priority, and message body. Simultaneously, the system applies a set of pre-defined validation rules, such as checking if key fields are empty, whether the IP address field conforms to IPv4 or IPv6 standards, and whether the timestamp format is parsable. Any data packet that fails parsing or validation is marked as corrupted and immediately discarded, with the reason for discard recorded in the audit log. Furthermore, for situations where the same event source may repeatedly report the same data within milliseconds, the system performs deduplication based on the data source IP and event signature within a very short time sliding window, retaining only the first valid record.

[0053] Step two: Field alignment based on the mapping knowledge base. The initially cleaned data stream enters a mapping operator. This operator queries a built-in heterogeneous field mapping knowledge base based on the device identifier ID in the message metadata header. This knowledge base, indexed by device model and log type, defines the mapping relationships from the diverse field names in the raw logs to the system's internal standard field names. For example, the "src" field in the logs of a certain firewall model is mapped to the standard field "...". ", another intrusion detection system log" "Mapped to" Meanwhile, for numerical codes in the protocol, such as TCP flags and HTTP status codes, the mapping operator will convert them into more easily understood standard text labels, such as mapping the number "80" to the label "HTTP".

[0054] Step 3: Timestamp Calibration and Clock Synchronization. To ensure all events are analyzed under a unified time base, the system converts the acquisition timestamps carried by all events to the UTC+0 standard time format. Considering network transmission latency, the system sets a transmission latency threshold Δ, for example, 5 seconds. This threshold is set when the data packet arrives at the system. With the acquisition time carried by the data packet itself When the difference exceeds Δ, the system determines that the data packet has experienced abnormal latency. In this case, the system will determine the latency based on the historical average latency constant of the network link from which the data originated. The time of the event is corrected using the following formula: This makes the event time closer to the actual moment of occurrence. To handle data out-of-order issues caused by network jitter and other factors, this module fully utilizes the Watermark mechanism of the underlying distributed stream processing framework, allowing events that are late within a certain time range to be processed correctly, ensuring the accuracy of time-window-based calculations.

[0055] Step four: Construction and tagging of the unified event model. The data processed above is encapsulated into a structured unified event model. This model uses JSON format and includes a basic 5-tuple, spatiotemporal attributes, and extended business attributes. The basic 5-tuple includes source IP address, destination IP address, source port, destination port, and transport layer protocol. Spatiotemporal attributes include a calibrated UTC timestamp and event duration. Business attributes include event type, severity level, and original message digest. Before output, the system also calls a context enrichment operator to dynamically add a location tag, such as "China-Beijing," to the event based on the IP address information in the event and by associating it with an external IP geolocation database, thus enriching the event's contextual information. Finally, this module outputs a continuous and well-structured unified event model stream.

[0056] The event aggregation and correlation analysis module receives the unified event model stream described above. Its core task is to aggregate discrete alert events into meaningful attack activities and identify the causal relationships between them. See also Figure 2 The process begins with primary aggregation based on multidimensional tuples. The system sets a length of... A sliding time window, such as 60 seconds, is used. Within this window, the system extracts the {source IP address, destination IP address, destination port} triple from each standardized event as the aggregation primary key. All discrete alarm events occurring within the window time and having the same primary key are grouped into the same "event bucket to be analyzed". For example, within 60 seconds, from... arrive Port 22 may have experienced multiple independent events, such as "SSH brute-force attempt," "SSH login failure," and "SSH login success," which will be aggregated into a single bucket. Next, the system uses a pre-built temporal association rule library to perform causal correlation matching on the events within the bucket. The association rules are defined in JSON format, clearly describing the logical relationships, attribute constraints, and temporal requirements of the steps in the attack chain. A typical multi-stage attack rule may include preconditions, intermediate conditions, and postconditions. For example, the rule definition might be: Precondition A is an event type equal to "..." "And in The number of occurrences within the time window exceeds the threshold N; the inorder condition B is that the event type equals " "and the time of its occurrence" It must be later than the time of the last event that satisfies condition A. The subsequent condition C is that the event type is equal to "". "or" "and it occurred later than" The matching algorithm traverses all events within the event bucket. If it finds a set of events that perfectly matches all the conditions and temporal logic defined by a certain rule, it determines that there is a strong causal relationship between these events, and they are not isolated alerts. The system assigns a globally unique attack task ID to this set of related events. Finally, a complete event chain is constructed. The system uses the attack task ID as an index and the event timestamps as the order to connect causally related events into a directed chain in chronological order. During the construction process, if the system detects logical gaps in the event chain at critical stages, such as the absence of the common "command execution" event after successful login and before file download, the system will not simply ignore it but will automatically insert a virtual node at the corresponding time position and mark it as a "suspicious activity gap," alerting the analysts. This forms a complete attack tracing diagram that includes the initial entry point of the attack, the internal lateral movement path, the final scope of impact, and possible hidden links.

[0057] The analysis and decision-making engine is responsible for quantitatively assessing and classifying the risks of the constructed event chain. The engine's core is based on multi-dimensional rule templates and a dynamic strategy set. The multi-dimensional rule templates define the dimensions and base scores for risk calculation. Attack phase dimension weights are set according to the attack chain model; for example, the weight for the "scanning and probing" phase is set to 0.8, the "attack breakthrough" phase to 1.2, the "remote control" phase to 1.5, and the "theft and exploitation" phase to 2.0, reflecting the differences in the severity of each phase. The target role dimension coefficient is dynamically adjusted based on asset importance; for example, the coefficient for a core database server is 2.0, while the coefficient for a regular office terminal is 0.5. The event base score is preset with a score between 0 and 100 for each specific attack type; for example, the base score for "SQL injection" is 85, and the base score for "port scanning" is 30. The dynamic strategy set is used to adapt the engine to different training scenarios. The strategy set is defined through a configuration table. For example, when the exercise type is configured as "Red-Blue Team Competition," the "Sensitive Mode" strategy is enabled. This strategy reduces all judgment thresholds by 20% and significantly increases the weight coefficient of core targets, aiming to discover potential threats with stricter standards. When the exercise type is configured as "CTF Competition," the "Scoring Mode" strategy is enabled. This strategy filters out most attack events and only focuses on events directly related to "Flag Submission" for scoring. When the engine receives an event chain E to be analyzed, its automatic judgment logic begins execution. The engine first extracts the main attack stage S to which the event chain belongs, the target role T of the main attack, and the most serious event type Type in the chain. Subsequently, it executes the formula for calculating the comprehensive risk value R:

[0058]

[0059] in, This is the base score for this event type. Weights for the attack phase. λ is the target role coefficient, and λ is the exercise type correction coefficient. In "Sensitive Mode," λ may be 1.2, and in "Scoring Mode," λ may be 0.1. The default value is 1. After calculating the R value, the system compares it with a preset grading threshold range. For example, the range is set as follows: 0 ≤ R < 30 is considered low risk, 30 ≤ R < 70 is considered medium risk, and R ≥ 70 is considered high risk. Based on the comparison results, the engine automatically assigns a final risk level label to the event chain.

[0060] The visualization and reporting module adopts a front-end and back-end separation architecture, providing users with intuitive results presentation and report output capabilities. This module retrieves analysis results data from the back-end database through a set of RESTful API interfaces. For statistical chart rendering, the back-end executes aggregation queries to calculate data such as "distribution of attack types in the past hour" and "percentage of current active event risk levels," and serializes it into JSON format before returning it to the front-end. The front-end uses a Canvas or SVG graphics engine to map numerical values ​​to graphical elements, such as using the area of ​​a pie chart's sector to represent the percentage, the height of a bar chart to represent the quantity, and the color intensity to represent the level of risk. For dynamic topology view generation, the system abstracts network asset IPs and attack source IPs as graph nodes, and attack traffic or event relationships as edges. The front-end uses a force-directed layout algorithm to calculate attraction and repulsion based on the connection relationships between nodes, dynamically generating the coordinate positions of nodes so that closely related nodes cluster together and isolated nodes move away, thus intuitively displaying the focal point and spread path of the attack. The attack timeline construction is based on a time axis, arranging all discrete events belonging to the same attack task ID in ascending order according to their calibrated timestamps. The system automatically calculates the time difference Δt between adjacent events. When Δt is less than a set threshold, the front-end interface merges these events into a single time block to avoid overcrowding the timeline. Users can expand the time block to view details by clicking on it. Furthermore, automatic report generation based on placeholder technology is another core function of this module. The system pre-builds a Word-format report template conforming to exercise debriefing specifications, using standard placeholder variables where dynamic data needs to be populated. When the commander triggers the report generation command, a background task is initiated. This task iterates through all analysis result datasets within the current time range, performs complex statistical and aggregation calculations, and derives the actual business data corresponding to each placeholder variable. This data may be plain text, structured tables, or Base64 format image streams generated from charts. Subsequently, the template engine reads the template file, uses regular expressions to accurately locate all placeholders, and replaces them with the calculated actual data. After replacement, the system calls a document conversion library to convert the final document stream into a PDF or Docx binary stream, providing it to the user for download or direct delivery to a specified storage location.

[0061] In this embodiment, the specific steps of the intelligent analysis method for multi-source data from the target range correspond completely to the system workflow, specifically including:

[0062] Step S110: Through a unified access gateway based on a pluggable microkernel architecture, protocol adapters for Syslog, RESTful API and Kafka are instantiated in parallel to collect multi-source heterogeneous raw data generated by attack machines, defense machines, monitoring nodes and log platforms in the test range, and each piece of raw data is uniformly encapsulated and metadata is injected to form an internal general data object.

[0063] Step S120, based on the distributed stream processing architecture, performs cleaning, parsing, field alignment, timestamp calibration and tagging processing on the internal general data object stream, specifically covering data cleaning and verification, field mapping, time correction and context enrichment sub-steps, and outputs a structured unified event model stream.

[0064] Step S130: Perform multi-dimensional attribute aggregation on the unified event model stream based on time sliding window, form event buckets to be analyzed with {source IP, destination IP, target port} as keys, and perform causal logic matching on the events in the buckets based on the preset time-series association rule base to construct a complete analysis event chain containing a unique attack task identifier, and insert virtual prompt nodes for logical blank stages.

[0065] Step S140: Based on the preset multi-dimensional rule template and dynamic strategy set, extract the attack stage, target role and event type factors of the event chain, execute the multi-factor weighted scoring calculation formula, obtain the comprehensive risk value, and automatically determine its risk level as low risk, medium risk or high risk according to the preset threshold range.

[0066] Step S150: Obtain the analysis results data through the RESTful API of the front-end and back-end separated architecture. The front-end renders and generates statistical charts, dynamic topology views, and attack timelines. At the same time, the back-end automatically populates the data and generates a drill debriefing report document based on template engine and placeholder technology. The topology view is dynamically generated based on the strength of the relationship between nodes using a force-directed layout algorithm, and the attack timeline is constructed by merging discrete events that are close in time.

[0067] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A multi-source data intelligent analysis system for a target range, characterized in that, include: The data access module is built on a pluggable microkernel architecture to create a unified access gateway. It is used to instantiate multiple protocol adapters in parallel to establish independent communication channels with attack machines, defense machines, monitoring nodes and log platforms in the test range. It also performs unified encapsulation and metadata injection on the raw data accessed through different protocols to form internal general data objects. The data processing and standardization module, built on a distributed stream processing architecture, is used to clean, parse, align fields, calibrate timestamps, and label the internal general data objects, and output a structured unified event model stream. The event aggregation and correlation analysis module is used to perform multi-dimensional attribute aggregation on the unified event model stream based on a time sliding window to form event buckets to be analyzed, and to perform causal correlation matching on the events in the buckets based on a preset time-series correlation rule library to construct a complete analysis event chain including attack task identifiers. The analysis and decision engine is used to perform multi-factor weighted scoring calculation on the analysis event chain based on the preset multi-dimensional rule template and dynamic strategy set, to obtain a comprehensive risk value, and to automatically determine the risk level based on a preset threshold range. The visualization and reporting module adopts a front-end and back-end separation architecture. It is used to obtain the analysis results data through the RESTful API interface and render and generate statistical charts, dynamic topology views and attack timelines. At the same time, it automatically generates exercise reports based on template engine and placeholder technology.

2. The intelligent analysis system for multi-source data of a target range according to claim 1, characterized in that, In the data access module, the parallel instantiation of the protocol adapter specifically includes: for the Syslog protocol, starting a server to listen on UDP / 514 port and using a line-delimiter-based decoder to handle TCP packet fragmentation and reassembly issues to receive unstructured log streams; for the RESTful interface, starting an embedded lightweight web server to expose a standard POST API endpoint and receiving JSON-formatted alarm data after authenticating and verifying the request header; for the Kafka message queue, configuring the gateway as a Kafka consumer group, setting enable.auto.commit=false to disable automatic commit, and using a batch pull mechanism to subscribe to data for a specified topic.

3. The intelligent analysis system for multi-source data of a target range according to claim 1, characterized in that, The specific process executed by the data processing and standardization module includes: Step 1: Multi-source heterogeneous data cleaning and parsing. Use regular expression library to parse unstructured logs and apply preset verification rules to discard corrupted data packets that cannot be parsed. Filter dirty data with empty key fields or formats that do not conform to IP address standards. At the same time, based on time sliding window, deduplicatize millisecond-level repeated reports of data from the same event source. Step 2: Based on the field alignment of the mapping knowledge base, when the data flows through the Map operator, the system queries the built-in heterogeneous field mapping knowledge base according to the device identifier ID in the message header, maps the original heterogeneous field names to standard field names, and converts the numerical protocol code into standard text tags. Step 3: Timestamp calibration and clock synchronization. All collected timestamps are uniformly converted to the UTC+0 standard time format, and a transmission delay threshold Δ is set. This threshold is set when the data packet arrives. With the collection time carried When the difference exceeds Δ, it is determined based on the average delay constant of the network link. The time of the event is adjusted using the following formula: Meanwhile, the Watermark mechanism of the distributed stream processing framework is used to process out-of-order data; Step four involves the construction and tagging of a unified event model. This generates a standardized JSON data structure with basic quintuples, spatiotemporal attributes, and business attributes. The context enrichment operator is then invoked to associate the event with a geographic location database based on the IP address, thus tagging the event with its place of origin.

4. The intelligent analysis system for multi-source data of a target range according to claim 1, characterized in that, In the event aggregation and correlation analysis module, the primary aggregation process based on multidimensional tuples is as follows: The length is set to... The time sliding window extracts the {source IP address, destination IP address, destination port} triple from the unified event model stream as the aggregation primary key, and groups discrete alarm events with the same primary key into the same event bucket to be analyzed.

5. The intelligent analysis system for multi-source data of a target range according to claim 1, characterized in that, In the aforementioned analysis and decision-making engine, the automatic risk level determination logic is as follows: When the engine receives an event E to be analyzed, it extracts its attack phase S, target role T, and event type Type, and executes a calculation formula to obtain the comprehensive risk value R: Where λ is the exercise type correction coefficient; The base score for the corresponding event type, Weights for the attack phase. R is the target role coefficient; the system compares R with the preset classification threshold range to obtain the judgment result of low risk, medium risk or high risk.

6. The intelligent analysis system for multi-source data of a target range according to claim 1, characterized in that, In the visualization and reporting module, the automatic report generation process based on placeholder technology is as follows: a preset electronic document template containing standardized placeholder variables is called; when a report generation instruction is triggered, the background task traverses the current analysis dataset and calculates the actual business data corresponding to the placeholders; the template engine reads the data stream of the electronic document template, uses regular expression matching to locate the placeholders, and replaces the corresponding variables with actual text, table data, or Base64 format image streams; After the replacement is complete, the system will convert the rendered document object into a binary stream output in PDF or Office Open XML format.

7. A method for intelligent analysis of multi-source data from a target range, characterized in that, The specific steps include: Step S110: Through a unified access gateway based on a pluggable microkernel architecture, multiple protocol adapters are instantiated in parallel to collect multi-source heterogeneous raw data in the test range, and the raw data is uniformly encapsulated and metadata is injected to form an internal general data object. Step S120: Based on the distributed stream processing architecture, the internal general data object is cleaned, parsed, field aligned, timestamp calibrated and tagged to output a structured unified event model stream. Step S130: Perform multi-dimensional attribute aggregation based on time sliding window on the unified event model stream to form event buckets to be analyzed, and perform causal association matching on the events in the buckets based on the preset time-series association rule base to construct a complete analysis event chain including attack task identifiers; Step S140: Based on the preset multi-dimensional rule template and dynamic strategy set, perform multi-factor weighted scoring calculation on the judgment event chain to obtain a comprehensive risk value, and automatically determine the risk level according to the preset threshold range. Step S150: Using a front-end and back-end separation architecture, obtain the analysis results data and render and generate statistical charts, dynamic topology views and attack timelines. At the same time, automatically generate an exercise report based on template engine and placeholder technology.

8. The intelligent analysis method for multi-source data of a target range according to claim 7, characterized in that, In step S110, the process of uniformly encapsulating and injecting metadata into the raw data is as follows: an internal data object containing a metadata header and a raw payload is created for each raw data packet; the metadata header is dynamically injected with the collection timestamp, data source IP, access protocol type, and interface identifier; the raw payload is a Syslog text string, HTTP body, or Kafka binary stream, stored losslessly in Base64 encoded form.

9. The intelligent analysis method for multi-source data of a target range according to claim 7, characterized in that, In step S130, the process of constructing a complete analysis event chain containing attack task identifiers is as follows: using the attack task ID as an index and the timestamp as the weight of the directed edge, the associated events are connected in chronological order of occurrence; when there is a logical blank stage in the detected event chain, the system automatically inserts a virtual node to issue an alarm, forming a complete attack tracing diagram containing the attack entry point, lateral movement path, and scope of impact.

10. The intelligent analysis method for multi-source data of a target range according to claim 7, characterized in that, In step S150, the logic for rendering and generating a dynamic topology view and attack timeline includes: generating the dynamic topology view by using a force-directed layout algorithm, defining network asset IPs and attack source IPs as graph nodes, defining attack traffic or relationships as edges, and calculating repulsive and attractive forces based on the strength of relationships between nodes to dynamically generate node coordinates; constructing the attack timeline by using timestamps as indexes to arrange discrete events under the same attack task in ascending chronological order, and automatically calculating the time difference between adjacent events. ,when Events will be displayed in a combined manner if they are less than the set threshold.