Data collection method and device for internet simulation scene and storage medium

By classifying and associating vulnerabilities in internet simulation scenarios, and combining attack model frameworks and traffic detection rule bases for targeted data collection, the problem of high resource consumption in existing technologies is solved, and efficient intrusion detection is achieved.

CN117278245BActive Publication Date: 2026-06-30PENG CHENG LAB +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PENG CHENG LAB
Filing Date
2023-07-26
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing network intrusion detection systems consume a lot of resources in Internet simulation scenarios, and the differentiated alarms generated by various intrusion detection devices are not conducive to unified analysis of intrusion behavior.

Method used

By scanning and classifying vulnerabilities in internet simulation scenarios, a vulnerability cluster is established and associated with basic attacks. Targeted data collection is then carried out using an attack model framework and a traffic detection rule base, reducing system resource consumption.

Benefits of technology

It enables targeted range acquisition in Internet simulation scenarios, reducing system resource consumption and detection time overhead, and improving the efficiency and accuracy of intrusion detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117278245B_ABST
    Figure CN117278245B_ABST
Patent Text Reader

Abstract

This application discloses a data acquisition method, apparatus, and storage medium for internet simulation scenarios. The method includes: performing vulnerability scanning on the internet simulation scenario and classifying the vulnerabilities obtained to obtain a vulnerability set; associating the vulnerability set with a preset attack type classification dataset to obtain basic attacks corresponding to the vulnerability set, wherein the attack type classification dataset includes the correspondence between vulnerabilities and intrusion attack behaviors; determining the terminal-side data source based on the basic attacks and a preset attack model framework, and determining the traffic detection rules on the traffic side based on the basic attacks and a preset traffic detection rule base, wherein the attack model framework includes the correspondence between attack types and data sources; and performing targeted data acquisition based on the correspondence between the vulnerability set, data source, and traffic detection rules. In this embodiment of the invention, data acquisition can be performed only on vulnerabilities or weaknesses to support intrusion detection, thereby reducing the system overhead of data acquisition.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of network attack and defense drill technology, and in particular to a data acquisition method, device and storage medium for Internet simulation scenarios. Background Technology

[0002] In internet simulation environments, it is necessary to monitor attack and defense drills. This includes monitoring the specific operations of both attackers and defenders, analyzing attack and defense behaviors, and understanding the mechanisms behind them. Monitoring the attacker's specific behavior is essentially a form of network intrusion. To monitor and detect such intrusions, it is generally necessary to build an intrusion detection system within the scenario, collect network security data, and analyze and detect intrusion behaviors. To improve the accuracy and comprehensiveness of detection, multiple intrusion detection methods are typically employed, such as building a network intrusion detection system or deploying multiple intrusion detection devices.

[0003] Current popular intrusion detection solutions are rule-based Network Intrusion Detection Systems (NIDS), which match incoming network traffic from the internet with existing detection rules to detect intrusion behavior. NIDS can monitor network traffic in real time, and its rule base can be designed incrementally, meaning that new rules can be created and added to the detection rule base if new network intrusion methods emerge. However, the gradual increase in the rule base will further increase the consumption of system resources. Summary of the Invention

[0004] This application provides a data acquisition method, apparatus, and storage medium for Internet simulation scenarios, which can reduce the system overhead of data acquisition.

[0005] In a first aspect, embodiments of this application provide a data acquisition method for internet simulation scenarios, including:

[0006] The aforementioned Internet simulation scenario is subjected to vulnerability scanning, and the vulnerabilities obtained are classified to obtain a vulnerability collection;

[0007] The vulnerability set is associated with a preset attack type classification dataset to obtain the basic attack corresponding to the vulnerability set. The attack type classification dataset includes the correspondence between vulnerabilities and intrusion attack behaviors. The basic attack represents the intrusion attack behavior in the Internet simulation scenario.

[0008] The data source on the terminal side is determined based on the basic attack and the preset attack model framework, and the traffic detection rules on the traffic side are determined based on the basic attack and the preset traffic detection rule base. The attack model framework includes the correspondence between attack types and data sources, and the traffic detection rule base includes the correspondence between vulnerability types and traffic detection rules.

[0009] Targeted data collection is performed based on the correspondence between the vulnerability aggregation, the data source, and the traffic detection rules.

[0010] In some embodiments, the step of performing vulnerability scanning on the Internet simulation scenario and classifying the vulnerabilities obtained from the scan to obtain a vulnerability collection includes:

[0011] Determine the network resources and network topology of the Internet simulation scenario;

[0012] Based on the network resources and network topology, the Internet simulation scenario is scanned to obtain multiple vulnerabilities, which correspond to the assets in the Internet simulation scenario.

[0013] The multiple vulnerabilities are grouped according to the classification information in CVE (Common Vulnerabilities & Exposures) to obtain a vulnerability group.

[0014] In some embodiments, the attack type classification dataset is CAPEC (Common Attack Pattern Enumeration and Classification); associating the vulnerability set with the preset attack type classification dataset to obtain the basic attack corresponding to the vulnerability set includes:

[0015] The vulnerability set is associated with the corresponding relationship between vulnerabilities and intrusion attack behaviors in CAPEC and the preset attack-vulnerability association information to obtain the correspondence between the vulnerability set and the basic attack.

[0016] The attack-vulnerability association information serves as a supplementary set to the correspondence between vulnerabilities and intrusion attack behaviors in CAPEC.

[0017] In some embodiments, determining the data source on the terminal side based on the basic attack and a preset attack model framework includes:

[0018] Define the attack model framework;

[0019] Establish a mapping between the basic attack and the attack types in the attack model framework, and determine the target data source corresponding to the basic attack.

[0020] In some embodiments, determining the attack model framework includes:

[0021] Based on the ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) framework, determine the tactics, techniques, sub-techniques, and corresponding data sources within the framework content;

[0022] An attack model framework is constructed based on the identified tactics, techniques, sub-techniques, and corresponding data sources.

[0023] In some embodiments, determining the traffic detection rules on the traffic side based on the basic attack and a preset traffic detection rule base includes:

[0024] Define the traffic detection rule base;

[0025] Establish a mapping between the vulnerability set corresponding to the basic attack and the vulnerability types in the traffic detection rule base, and determine the target traffic detection rule corresponding to the basic attack.

[0026] In some embodiments, determining the traffic detection rule base includes:

[0027] Collect Snort rule sets and perform structured processing on the data in the Snort rule sets;

[0028] A traffic detection rule base is constructed based on the structured Snort rule set.

[0029] In some embodiments, the targeted data collection based on the correspondence between the vulnerability aggregation, the data source, and the traffic detection rules includes:

[0030] Determine the current vulnerability cluster and the corresponding basic attack;

[0031] The data sources that need to be collected are determined based on the current basic attack.

[0032] The traffic detection rules to be used are determined based on the current vulnerability aggregation.

[0033] Secondly, embodiments of this application provide a data acquisition device for an internet simulation scenario, comprising:

[0034] The scanning module is used to perform vulnerability scanning on the Internet simulation scenario, and to classify the vulnerabilities obtained from the scan to obtain a vulnerability collection.

[0035] The basic attack association module is used to associate the vulnerability set with a preset attack type classification dataset to obtain the basic attack corresponding to the vulnerability set. The attack type classification dataset includes the correspondence between vulnerabilities and intrusion attack behaviors. The basic attack represents the intrusion attack behavior in the Internet simulation scenario.

[0036] The mapping module is used to determine the data source on the terminal side based on the basic attack and the preset attack model framework, and to determine the traffic detection rules on the traffic side based on the basic attack and the preset traffic detection rule base. The attack model framework includes the correspondence between attack types and data sources, and the traffic detection rule base includes the correspondence between vulnerability types and traffic detection rules.

[0037] The targeted data collection module is used to collect targeted data based on the correspondence between the vulnerability aggregation, the data source, and the traffic detection rules.

[0038] Thirdly, embodiments of this application provide a data acquisition device for an Internet simulation scenario, including at least one processor and a memory for communicatively connecting to the at least one processor; the memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enable the at least one processor to perform the data acquisition method as described in the first aspect.

[0039] Fourthly, embodiments of this application provide a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the data acquisition method as described in the first aspect.

[0040] The data acquisition method, apparatus, and storage medium for Internet simulation scenarios provided in this application have at least the following beneficial effects: In Internet simulation scenarios, vulnerabilities detected are classified and aggregated to obtain vulnerability aggregates, which are then associated with basic attacks. Targeted data acquisition is then performed separately on the terminal side and the traffic side. On the terminal side, the attack type in the attack model framework determines the data source to be collected, and on the traffic side, the vulnerability type in the traffic detection rule base determines the traffic detection rules. This is equivalent to collecting data only on these vulnerabilities or weaknesses to support intrusion detection during attack and defense exercises in Internet simulation scenarios. Compared to existing full-terminal, full-traffic acquisition, this application embodiment achieves targeted range acquisition, significantly reducing system resource consumption and detection time overhead. Attached Figure Description

[0041] Figure 1 This is an overall flowchart of a data acquisition method for an Internet simulation scenario provided in one embodiment of this application;

[0042] Figure 2 yes Figure 1 Flowchart of the specific method for step S101;

[0043] Figure 3 yes Figure 1 Flowchart of the specific method for step S102;

[0044] Figure 4 yes Figure 1 Flowchart of the specific method for step S103;

[0045] Figure 5 yes Figure 4 Flowchart of the specific method for step S401;

[0046] Figure 6 yes Figure 1 Flowchart of the specific method for step S103;

[0047] Figure 7 yes Figure 6 Flowchart of the specific method for step S601;

[0048] Figure 8 yes Figure 1 Flowchart of the specific method for step S104;

[0049] Figure 9 This is a schematic diagram of a data acquisition device for an Internet simulation scenario provided in one embodiment of this application;

[0050] Figure 10 This is an example of a data acquisition method for an Internet simulation scenario provided in this application;

[0051] Figure 11 This is a schematic diagram of the structure of a data acquisition device for an Internet simulation scenario provided in one embodiment of this application. Detailed Implementation

[0052] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0053] In network technology simulation and verification platform applications, network attack and defense drills typically require constructing a simulated internet scenario for both attackers and defenders. This simulated internet scenario is a different network environment set up for different attack and defense drill objectives, such as attacks targeting a specific vulnerability or the reproduction of a security incident. Within this simulated internet environment, the attack and defense drills need to be monitored, specifically including monitoring the specific operations of both sides, analyzing attack and defense behaviors, and understanding the attack and defense mechanisms. Monitoring the attacker's specific behavior is essentially a network intrusion. To monitor and detect such intrusions, an intrusion detection system (IDS) is generally built within the scenario to collect network security data and analyze intrusion behavior. To improve the accuracy and comprehensiveness of detection, multiple intrusion detection methods are usually employed, such as building a network intrusion detection system or setting up multiple intrusion detection devices. This inevitably leads to increased resource consumption, and different IDS devices may produce different alerts for the same attack. Such differentiated alert information is not conducive to unified analysis of intrusion behavior and hinders the monitoring and analysis of network attacks.

[0054] Current popular intrusion detection solutions include rule-based Network Intrusion Detection Systems (NIDS), which detect intrusions by matching incoming network traffic from the internet against existing detection rules. Specifically, a rule-based NIDS system scans all traffic on a given network segment, processes the traffic data to extract usable information formats, and compares this information with known attack rules in its rule base. If a match is found, the system issues an alert indicating abnormal behavior. Generally, rule-based NIDS systems have high accuracy and can be designed incrementally, meaning new rules can be added to the rule base as new intrusion methods emerge. For example, Snort, an open-source intrusion detection software, uses pre-defined threat detection rules to monitor network traffic in real time and replay pcap data packets. Therefore, for existing rule-based NIDS systems, the detection rules are the core, and the completeness of the rule base is crucial for intrusion detection. However, continuously increasing the rule base will further increase system resource consumption.

[0055] To address the aforementioned issues, this embodiment provides a data acquisition method, apparatus, and storage medium for internet simulation scenarios. In these scenarios, scanned vulnerabilities are categorized and aggregated to obtain vulnerability summaries. These summaries are then associated with basic attacks. Targeted data acquisition is then performed separately on the terminal side and the traffic side. On the terminal side, the attack types within the attack model framework determine the data sources to be collected. On the traffic side, the vulnerability types within the traffic detection rule base determine the traffic detection rules. Essentially, during attack and defense drills in internet simulation scenarios, data is collected only targeting these vulnerabilities or weaknesses to support intrusion detection. Compared to existing full-terminal, full-traffic acquisition methods, this embodiment achieves targeted, wide-range acquisition, significantly reducing system resource consumption and detection time overhead.

[0056] Reference Figure 1 , Figure 1 This is a flowchart of a data acquisition method provided in an embodiment of this application. The data acquisition method for Internet simulation scenarios includes, but is not limited to, the following steps S101 to S104.

[0057] Step S101: Perform vulnerability scanning on the Internet simulation scenario, classify the vulnerabilities obtained from the scan, and obtain a vulnerability collection.

[0058] In step S101 of some embodiments, vulnerability scanning is performed on the Internet simulation scenario to obtain vulnerabilities under different Internet simulation scenarios. The scanned vulnerabilities are then classified to obtain a vulnerability collection that includes multiple vulnerability types. This facilitates vulnerability analysis and assessment, provides a better basis for vulnerability analysis and assessment, and makes it easier to improve security strategies and enhance vulnerability response capabilities in the future.

[0059] It should be noted that the Internet simulation scenario is a different network environment set up for different attack and defense experiment targets. During the setup of the Internet simulation scenario, information such as the topology of host resources, network environment, and host vulnerabilities is recorded. Therefore, it is necessary to scan the Internet simulation scenario to prevent the occurrence of attacks. During the vulnerability scanning of the Internet simulation scenario, vulnerability scanning tools such as Nmap, Nessus, and OpenVAS can be used to perform vulnerability scanning, and vulnerabilities are collected according to CVE (Common Vulnerabilities & Exposures). The vulnerability scanning process will be described in detail below, and will not be repeated in this embodiment.

[0060] Step S102: Associate the vulnerability collection with the preset attack type classification dataset to obtain the basic attacks corresponding to the vulnerability collection.

[0061] It should be noted that the attack type classification dataset includes the correspondence between vulnerabilities and intrusion attack behaviors, and the basic attack characterization represents intrusion attack behaviors in Internet simulation scenarios.

[0062] In step S102 of some embodiments, the vulnerability collection is associated with a preset attack type classification dataset to realize the association between vulnerabilities and basic attacks in Internet simulation scenarios. This effectively integrates the relationship between different systems, makes the data interconnected and complementary, expands the space for information expression, and each piece of data has multiple pieces of information associated with it.

[0063] It should be noted that the attack type classification dataset in this embodiment is the attack type enumeration and classification dataset CAPEC, which will be described in detail below, and will not be repeated here.

[0064] Step S103: Determine the data source on the terminal side based on the basic attack and the preset attack model framework, and determine the traffic detection rules on the traffic side based on the basic attack and the preset traffic detection rule base.

[0065] It should be noted that the attack model framework includes the correspondence between attack types and data sources, and the traffic detection rule base includes the correspondence between vulnerability types and traffic detection rules.

[0066] In step S103 of some embodiments, the data source on the terminal side is determined based on the basic attack and the preset attack model framework to facilitate subsequent targeted data collection on the terminal side, and the traffic detection rules on the traffic side are determined based on the basic attack and the preset traffic detection rule base to facilitate subsequent data collection on the traffic side. Thus, when conducting attack and defense exercises in Internet simulation scenarios, data collection can be performed only on these vulnerabilities or weaknesses to support intrusion detection, without the need for full traffic collection, thereby reducing the consumption of system resources.

[0067] Step S104: Targeted data collection is performed based on the correspondence between vulnerability aggregation, data source, and traffic detection rules.

[0068] In step S104 of some embodiments, targeted data collection is performed based on the correspondence between vulnerability aggregation and data source and traffic detection rules, which realizes targeted range collection and greatly reduces system resource consumption and detection time overhead.

[0069] Reference Figure 2 , Figure 2 yes Figure 1 The flowchart of the specific method of step S101 is a further explanation of step S101. Step S101 includes, but is not limited to, steps S201 to S203.

[0070] Step S201: Determine the network resources and network topology of the Internet simulation scenario.

[0071] Step S202: Scan the Internet simulation scenario based on network resources and network topology to obtain multiple vulnerabilities, and the vulnerabilities correspond to the assets in the Internet simulation scenario.

[0072] Step S203: Group multiple vulnerabilities according to the classification information in the publicly disclosed CVEs to obtain a vulnerability group.

[0073] In steps S201 to S203 of some embodiments, during the vulnerability scanning of the Internet simulation scenario, firstly, the network resources and network topology of the Internet simulation scenario are determined, so as to simulate the real network environment and better understand the interaction and behavior between various components in the network. Then, the Internet simulation scenario is scanned according to the network resources and network topology to obtain multiple vulnerabilities corresponding to the assets of the Internet simulation scenario. Finally, the multiple vulnerabilities are grouped according to the classification information in the public vulnerability disclosure to obtain a vulnerability group, which facilitates vulnerability analysis and assessment, provides a better basis for vulnerability analysis and assessment, and facilitates subsequent improvement of security strategies and improvement of vulnerability response capabilities.

[0074] It should be noted that, in this embodiment, CVE is equivalent to a dictionary, providing a common name for widely recognized information security vulnerabilities or exposed weaknesses. Using a common name helps users share data across their respective vulnerability databases and vulnerability assessment tools. For example, if a vulnerability report specifies a vulnerability with a CVE name, corresponding patch information can be quickly found in any other CVE-compatible database, thereby further addressing the security issue.

[0075] Understandably, the classification information in CVEs can be vulnerability information collected from different vulnerability disclosure channels, such as vulnerability databases and vendor announcements. This classification information includes, but is not limited to, vulnerability name, vulnerability number, and vulnerability description. During the process of aggregating multiple vulnerabilities according to the classification information in publicly disclosed CVEs, the collected vulnerabilities are categorized into the corresponding classification dimensions based on their classification information. This facilitates subsequent querying, analysis, and summarization.

[0076] Reference Figure 3 , Figure 3 yes Figure 1 The flowchart of the specific method for step S102 is a further explanation of step S102, which includes but is not limited to step S301.

[0077] Step S301: Based on the correspondence between vulnerabilities and intrusion attack behaviors in CAPEC and the preset attack-vulnerability association information, the vulnerability collection is associated to obtain the correspondence between the vulnerability collection and the basic attack.

[0078] It should be noted that the attack-vulnerability association information serves as a supplementary set to the correspondence between vulnerabilities and intrusion attack behaviors in CAPEC.

[0079] Understandably, CAPEC is a classification dataset for common attack types, with common attack patterns including: authorization attacks - remote file execution privilege escalation, authentication attacks - cross-domain request forgery, etc.

[0080] In step S301 of some embodiments, the vulnerability collection is associated with the correspondence between vulnerabilities and intrusion attack behaviors in CAPEC and the preset attack-vulnerability association information to obtain the correspondence between the vulnerability collection and the basic attack, thereby realizing the association between the vulnerability collection and the basic attack, which facilitates the remediation of the vulnerability collection.

[0081] It should be noted that the classification standard for basic attacks in this embodiment comes from CAPEC. In the process of associating the vulnerability collection, CAPEC can be used for automatic association, or a semi-automatic method can be used. The semi-automatic method relies on data analysis and machine algorithms, combined with human intervention to provide results, thereby achieving accurate association of the vulnerability collection.

[0082] In some embodiments, the CAPEC dataset is a database of classifications and descriptions of common attack patterns, including but not limited to attack pattern number information, attack pattern name information, attack pattern category information, attack pattern attribute information, and so on.

[0083] Reference Figure 4 , Figure 4 yes Figure 1 The specific method flowchart for step S103 includes, but is not limited to, the following steps S101 to S104.

[0084] It is understandable that in the process of determining the data source on the terminal side based on the basic attack and the preset attack model framework, the main approach is to use the ATT&CK framework to target and collect data on the terminal side, such as operation logs and security events. Then, based on the definition of the data source in the ATT&CK framework, a certain part of the data is targeted and collected. The following is a detailed explanation.

[0085] Step S401: Determine the attack model framework.

[0086] Step S402: Establish the mapping between basic attacks and attack types in the attack model framework, and determine the target data source corresponding to the basic attack.

[0087] In steps S401 to S402 of some embodiments, during the process of determining the data source on the terminal side, it is necessary to determine the attack model framework. In this embodiment, the attack model framework is the ATT&CK framework, which allows for a clearer observation of the attacker's relatively continuous attack techniques. Due to the completeness of the framework, the attacker's attack coverage can also be analyzed. Subsequently, a mapping between basic attacks and attack types in the attack model framework is established, enabling the attack model framework to collect data sources and determine the target data source corresponding to the basic attacks. This allows for the acquisition of the data source of the basic attacks on the terminal side, enabling rapid detection of the data source, clearer analysis of attack coverage, and improved detection and defense capabilities against complex attacks and threat behaviors.

[0088] It's important to note that ATT&CK is a "Adversarial Tactics, Techniques, and Common Sense" framework used to describe and classify adversarial behaviors based on real-world attack and defense data. The ATT&CK framework can comprehensively map attacker behavior, allowing for clearer observation of continuous attack techniques and, due to its comprehensiveness, analysis of attack coverage. The data source object primarily defines which data sources can be collected to pinpoint the attack technique. For example, the T1566 phishing attack technique uses application logs, file creation data, and network traffic data as data sources; collecting these data sources allows for faster identification of the attack's scope. Furthermore, the ATT&CK knowledge base has certain correlations with CAPEC attack classifications, such as CAPEC-98: Phishing and ATT&CK-T1566: Phishing, CAPEC-469: HTTP Dos and ATT&CK-T1499.002: Endpoint Denial of Service: Service ExhaustionFlood, further enriching the relevance of the attack model framework.

[0089] Reference Figure 5 , Figure 5 yes Figure 4 The flowchart of the specific method for step S401 is a further explanation of step S401, which includes, but is not limited to, steps S501 to S502.

[0090] Step S501: Based on the ATT&CK framework of adversarial tactics, techniques, and common sense, determine the tactics, techniques, sub-techniques, and corresponding data sources in the framework content.

[0091] Step S502: Construct an attack model framework based on the determined tactics, techniques, sub-techniques and corresponding data sources.

[0092] In some embodiments, steps S501 to S502, during the process of determining the attack model framework, the tactics, techniques, sub-techniques and corresponding data sources in the framework content are determined according to the ATT&CK framework, so as to understand the attacker's action patterns, targets, etc., and realize intelligence collection and analysis, attack detection and defense, etc. in Internet simulation scenarios. Finally, the attack model framework is constructed according to the determined tactics, techniques, sub-techniques and corresponding data sources to deepen the understanding of different types of attacks and improve the threat identification capability.

[0093] It's important to note that the ATT&CK knowledge base contains data sources for each attack technique. These data sources define what data needs to be collected to detect these techniques, such as information on process creation, registry modifications, etc. Furthermore, attack techniques or sub-techniques within the ATT&CK framework have data source attributes; the data source indicates the possible data sources needed to detect that attack technique. For example, detecting the T1543.003 Windows service sub-technique involves data sources such as processes, the Windows Registry, and services. Taking processes as an example, this means collecting event logs of process creation. Similarly, using the Windows Registry as an example, this means collecting event logs of Windows Registry key modifications and creation.

[0094] Reference Figure 6 , Figure 6 yes Figure 1 Another specific method flowchart for step S103 is a further explanation of step S103, which includes, but is not limited to, steps S601 to S602.

[0095] Step S601: Determine the traffic detection rule base.

[0096] In step S601 of some embodiments, a traffic detection rule base is determined to monitor network traffic in real time and perform rule matching.

[0097] Reference Figure 7 , Figure 7 yes Figure 6 The flowchart of the specific method for step S601 is a further explanation of step S601. Step S601 includes, but is not limited to, steps S701 to S702.

[0098] Step S701: Collect the Snort rule set and perform structured processing on the data in the Snort rule set.

[0099] Step S702: Construct a traffic detection rule base based on the structured Snort rule set.

[0100] In steps S701 to S702 of some embodiments, in the process of determining the traffic detection rule base, firstly, the Snort rule set is collected, and the data of the Snort rule set is structured to facilitate information extraction. Then, the traffic detection rule base is constructed based on the structured Snort rule set, so as to meet the security monitoring and defense needs of Internet simulation scenarios.

[0101] Step S602: Establish a mapping between the vulnerability collection corresponding to the basic attack and the vulnerability types in the traffic detection rule base, and determine the target traffic detection rule corresponding to the basic attack.

[0102] In steps S601 to S602 of some embodiments, during the process of determining the traffic detection rules on the traffic side, it is necessary to determine the traffic detection rule base to monitor network traffic in real time and match rules. Then, a mapping is established between the vulnerability set corresponding to the basic attack and the vulnerability type in the traffic detection rule base, so as to obtain the detection rule set corresponding to each different network attack, and determine the target traffic detection rule corresponding to the basic attack, thereby realizing the association between the vulnerability set, the basic attack, and the traffic detection rule base.

[0103] It should be noted that many Snort rule descriptions contain CVE vulnerability information. In the process of establishing the mapping between the vulnerability set corresponding to the basic attack and the vulnerability type in the traffic detection rule base, the CVE vulnerability information in the rules can be extracted, i.e., Snort.rule<->CVE vulnerability. Then, by using the association between the known vulnerability set and the attack type classification dataset and the CVE information in Snort.rule, the mapping between the basic attack vulnerability and Snort.rule can be established. This mapping is the detection rule set corresponding to each different network attack, so as to determine the target traffic detection rule corresponding to the basic attack.

[0104] Understandably, on the traffic side, this typically involves monitored traffic information or pcap packets. Here, we use Snort as an example of an intrusion detection system. Within the Snort intrusion detection rule set, there exist CVE vulnerabilities corresponding to the current detection rule. Starting from the relationship between CVE vulnerabilities and detection rules, we can map the association between Snort rules and CVE vulnerabilities: [Snort.rule <-> CVE vulnerability <-> basic attack]. In this way, for a vulnerability existing in an internet simulation scenario on the traffic side, it can be mapped to one or more detection rules. During attack and defense drills in the simulation scenario, only the rule corresponding to the vulnerability needs to be used to detect intrusion behavior, rather than the traditional full rule approach, significantly reducing system resource consumption.

[0105] It is worth noting that the Snort rule set is a set of rules used in network intrusion detection and intrusion prevention systems. It includes information such as rule headers and rule options. The rule header includes the rule's metadata, such as protocol, source address, and destination address, to identify and match traffic in the network. The rule options are used to detect and match specific protocols, ports, data content, flags, etc., to determine whether potential intrusion has occurred.

[0106] Reference Figure 8 , Figure 8 yes Figure 1 The flowchart of the specific method for step S104 is a further explanation of step S104, which includes but is not limited to steps S801 to S803.

[0107] Step S801: Determine the current vulnerability set and the current basic attack corresponding to the current vulnerability set.

[0108] Step S802: Determine the data source to be collected based on the current basic attack.

[0109] Step S803: Determine the traffic detection rules to be used based on the current vulnerability aggregation.

[0110] In steps S801 to S803 of some embodiments, during the targeted data collection process, firstly, the current vulnerability set and the current basic attack corresponding to the current vulnerability set are determined, thereby determining the specific attack type in the Internet simulation scenario. Then, the data source to be collected is determined based on the current basic attack, realizing data source collection on the terminal side. Finally, the traffic detection rules to be adopted are determined based on the current vulnerability set, realizing the specification of traffic detection rules on the traffic side. This enables data collection only for vulnerabilities or weaknesses, achieving targeted scope collection and significantly reducing system resource consumption and detection time overhead.

[0111] Please see Figure 9 This application also provides a data acquisition device for Internet simulation scenarios, the device comprising:

[0112] The scanning module 901 is used to perform vulnerability scanning on Internet simulation scenarios, classify the vulnerabilities obtained from the scan, and obtain a vulnerability collection.

[0113] The basic attack association module 902 is used to associate the vulnerability collection with the preset attack type classification dataset to obtain the basic attack corresponding to the vulnerability collection. The attack type classification dataset includes the correspondence between vulnerabilities and intrusion attack behaviors. The basic attack represents the intrusion attack behavior in the Internet simulation scenario.

[0114] The mapping module 903 is used to determine the data source on the terminal side based on the basic attack and the preset attack model framework, and to determine the traffic detection rules on the traffic side based on the basic attack and the preset traffic detection rule base. The attack model framework includes the correspondence between attack types and data sources, and the traffic detection rule base includes the correspondence between vulnerability types and traffic detection rules.

[0115] The targeted data collection module 904 is used to collect targeted data based on the correspondence between vulnerability aggregation and the data source and traffic detection rules.

[0116] The specific implementation of the data acquisition device for Internet simulation scenarios is basically the same as the specific implementation of the data acquisition method for Internet simulation scenarios described above, and will not be repeated here.

[0117] To more clearly illustrate the data collection methods for Internet simulation scenarios described above, a specific example is provided below.

[0118] Example 1:

[0119] Example 1 is a specific example of data collection in an Internet simulation scenario.

[0120] In an internet simulation scenario, this embodiment proactively detects asset vulnerabilities in the scenario. By proactively monitoring vulnerabilities or weaknesses that may be attacked by different network intrusion behaviors in advance, intrusion detection rules are set for vulnerabilities or weaknesses in advance. These set rules determine the scope of data collection. Compared with traditional full traffic collection, targeted collection focuses the collection scope on the location of vulnerabilities / weaknesses, reducing the consumption of system resources.

[0121] During the data acquisition process, this embodiment first collects and organizes scene resources in an internet simulation scenario to determine the current scene topology and scan for vulnerabilities or weaknesses that may be exploited by attackers. By pre-setting data acquisition schemes for these vulnerabilities, only these acquisition rules are activated during intrusion detection, significantly reducing the scope of data acquisition. If a detection is successful, it becomes clear which vulnerabilities have been exploited and which have not. Furthermore, by mapping ATT&CK, the attack coverage can be analyzed more clearly.

[0122] This embodiment can be divided into two stages: the first stage is the basic information collection and association stage, the second stage is the terminal-side targeted data collection method, and the third stage is the traffic-side targeted data collection method.

[0123] Reference Figure 10 , Figure 10This is a flowchart illustrating a specific data acquisition method provided in this application example. The data acquisition method for Internet simulation scenarios includes, but is not limited to, the following steps S1 to S10.

[0124] The specific steps are as follows:

[0125] The primary objective of the first phase is to collect data on resources, data, and vulnerabilities from hosts within a simulated environment. Internet simulation environments are different network environments established to target various attack and defense experiments. These targets might include attacks on specific vulnerabilities or the reproduction of certain security incidents. This phase records host resource topology information, network environment, and host vulnerabilities. Collected host vulnerabilities are categorized under the CVE system, and attack behaviors are classified based on vulnerability information.

[0126] The first phase includes the following steps:

[0127] Step S1: Use vulnerability collection tools to scan the current Internet simulation scenario and collect vulnerabilities according to the CVE system.

[0128] Step S2: The scanned CVE vulnerabilities are classified by being associated with the BA system through semi-automatic and expert knowledge methods, and the relationship between CVE and BA is established.

[0129] It should be noted that the BA (Basic Attack) attack system is based on CAPEC, which is a classification dataset of common attack types. CAPEC has been explained in detail in the previous section of this implementation, so it will not be repeated here.

[0130] The second stage is the terminal-side targeted data collection stage. The main purpose is to collect terminal-side data, such as operation logs and security events, through the ATT&CK system. A specific part of the data is collected based on the definition of the data source in the ATT&CK system.

[0131] The second phase includes the following steps:

[0132] Step S3: ATT&CK system data collection, collecting tactics, technologies, sub-technologies and corresponding data sources within its framework.

[0133] Step S4: Association construction, which maps between techniques in the basic attack BA and ATT&CK systems using semi-automatic and expert knowledge.

[0134] Step S5: ATT&CK system data source collection.

[0135] In the ATT&CK system, attack techniques or sub-techniques have data source attributes. The data source is the data source that may be needed to detect the attack technique. For example, when detecting the T1543.003 Windows service sub-technique, the data source can be processes, the Windows Registry, services, etc. Taking the process as an example, it means collecting event logs of process creation. Or, taking the Windows Registry as an example, it means collecting event logs of Windows Registry key modification and Windows Registry key creation.

[0136] Step S6: Based on the information from the above steps, construct the following steps: CVE vulnerability <-> basic attack BA <-> ATT&CK technology <-> ATT&CK data source <-> terminal side data.

[0137] The third stage is the traffic-side targeted data collection stage. Since rule-based network intrusion detection systems are all centered on detection rule bases, this stage uses Snort as an example to detect traffic. Snort's traffic detection is also centered on its rule set. The purpose of this invention is to set different rule detection sets for different network attacks. When performing intrusion detection in a simulation scenario, only the rules related to vulnerabilities or weaknesses existing in the scenario need to be enabled, rather than the traditional full traffic collection, thereby improving detection efficiency and reducing system resource consumption.

[0138] The third stage includes the following steps:

[0139] Step S7: Collect the Snort rule set and perform structured processing to facilitate information extraction.

[0140] Step S8: CVE vulnerability information exists in many rule descriptions in Snort. Extract the CVE vulnerability information from the rule, i.e., Snort.rule<->CVE vulnerability.

[0141] Step S9: Establish a mapping between basic attack BA-CVE vulnerability-Snort.rule using the known CVE-BA association and the CVE information in Snort.rule. This mapping is the set of detection rules corresponding to each different network attack.

[0142] Step S10: Based on the information from the above steps, construct the CVE vulnerability <-> basic attack BA <-> Snort.rule association.

[0143] In the example provided by this invention, the collection scope can be set for vulnerabilities or weaknesses that may be exploited in Internet simulation scenarios. Each vulnerability is associated with a small data collection scope through mapping. The terminal side relies on the data source attributes of ATT&CK, and the traffic side relies on the rules of the network intrusion detection system. When conducting attack and defense exercises in the simulation scenario, data is collected only for these vulnerabilities or weaknesses to support intrusion detection. This allows for a clearer analysis of vulnerability exploitation and single-step attack coverage. Furthermore, attack behaviors can be mapped in the ATT&CK framework, providing a further analytical basis for multi-step attack detection.

[0144] Reference Figure 11 Taking the data acquisition device 1000 for an internet simulation scenario, where the control processor 1001 and memory 1002 are connected via a bus as an example. The memory 1002, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 1002 may optionally include memory remotely located relative to the control processor 1001, and these remote memories can be connected to the data acquisition device 1000 for the internet simulation scenario via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0145] Those skilled in the art will understand that Figure 11 The device structure shown does not constitute a limitation on the data acquisition device 1000 for Internet simulation scenarios. It may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0146] This application also provides a computer-readable storage medium storing computer-executable instructions that are executed by one or more control processors, for example, by... Figure 11 One of the control processors 1001 executes the above-described data acquisition method for the Internet simulation scenario in the above-described method embodiments.

[0147] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0148] It will be understood by those skilled in the art that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components can be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit. Such software can be distributed on a computer-readable medium, which can include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to a computer. Furthermore, as is known to those skilled in the art, communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

Claims

1. A data collection method for an Internet simulation scenario, characterized in that, include: The aforementioned Internet simulation scenario is subjected to vulnerability scanning, and the vulnerabilities obtained are classified to obtain a vulnerability collection; The vulnerability set is associated with a preset attack type classification dataset to obtain the basic attack corresponding to the vulnerability set. The attack type classification dataset includes the correspondence between vulnerabilities and intrusion attack behaviors. The basic attack represents the intrusion attack behavior in the Internet simulation scenario. The data source on the terminal side is determined based on the basic attack and the preset attack model framework, and the traffic detection rules on the traffic side are determined based on the basic attack and the preset traffic detection rule base. The attack model framework includes the correspondence between attack types and data sources, and the traffic detection rule base includes the correspondence between vulnerability types and traffic detection rules. Targeted data collection is performed based on the correspondence between the vulnerability aggregation, the data source, and the traffic detection rules.

2. The data acquisition method according to claim 1, characterized in that, The process of performing vulnerability scanning on the internet simulation scenario and classifying the vulnerabilities obtained to obtain a vulnerability collection includes: Determine the network resources and network topology of the Internet simulation scenario; Based on the network resources and network topology, the Internet simulation scenario is scanned to obtain multiple vulnerabilities, and the vulnerabilities correspond to the assets in the Internet simulation scenario; The multiple vulnerabilities are grouped according to the classification information in the publicly disclosed CVEs to obtain a vulnerability group.

3. The data acquisition method according to claim 1, characterized in that, The attack type classification dataset is the attack type enumeration and classification dataset CAPEC; The step of associating the vulnerability set with a preset attack type classification dataset to obtain the basic attacks corresponding to the vulnerability set includes: The vulnerability set is associated with the corresponding relationship between vulnerabilities and intrusion attack behaviors in CAPEC and the preset attack-vulnerability association information to obtain the correspondence between the vulnerability set and the basic attack. The attack-vulnerability association information serves as a supplementary set to the correspondence between vulnerabilities and intrusion attack behaviors in CAPEC.

4. The data acquisition method according to claim 1, characterized in that, The step of determining the data source on the terminal side based on the basic attack and the preset attack model framework includes: Define the attack model framework; Establish a mapping between the basic attack and the attack types in the attack model framework, and determine the target data source corresponding to the basic attack.

5. The data acquisition method according to claim 4, characterized in that, The defined attack model framework includes: Based on the ATT&CK framework, which focuses on adversarial tactics, techniques, and common sense, the framework content includes tactics, techniques, sub-techniques, and corresponding data sources. An attack model framework is constructed based on the identified tactics, techniques, sub-techniques, and corresponding data sources.

6. The data acquisition method according to claim 1, characterized in that, The step of determining the traffic detection rules on the traffic side based on the basic attack and the preset traffic detection rule base includes: Define the traffic detection rule base; Establish a mapping between the vulnerability set corresponding to the basic attack and the vulnerability types in the traffic detection rule base, and determine the target traffic detection rule corresponding to the basic attack.

7. The data acquisition method according to claim 6, characterized in that, The determination of the traffic detection rule base includes: Collect Snort rule sets and perform structured processing on the data in the Snort rule sets; A traffic detection rule base is constructed based on the structured Snort rule set.

8. The data acquisition method according to claim 1, characterized in that, The targeted data collection based on the correspondence between the vulnerability aggregation, the data source, and the traffic detection rules includes: Determine the current vulnerability cluster and the corresponding basic attack; The data sources that need to be collected are determined based on the current basic attack. The traffic detection rules to be used are determined based on the current vulnerability aggregation.

9. A data acquisition device for Internet simulation scenarios, characterized in that, include: The scanning module is used to perform vulnerability scanning on the Internet simulation scenario, and to classify the vulnerabilities obtained from the scan to obtain a vulnerability collection. The basic attack association module is used to associate the vulnerability set with a preset attack type classification dataset to obtain the basic attack corresponding to the vulnerability set. The attack type classification dataset includes the correspondence between vulnerabilities and intrusion attack behaviors. The basic attack represents the intrusion attack behavior in the Internet simulation scenario. The mapping module is used to determine the data source on the terminal side based on the basic attack and the preset attack model framework, and to determine the traffic detection rules on the traffic side based on the basic attack and the preset traffic detection rule base. The attack model framework includes the correspondence between attack types and data sources, and the traffic detection rule base includes the correspondence between vulnerability types and traffic detection rules. The targeted data collection module is used to collect targeted data based on the correspondence between the vulnerability aggregation, the data source, and the traffic detection rules.

10. A data acquisition device for Internet simulation scenarios, characterized in that, It includes at least one processor and a memory for communicatively connecting with said at least one processor; The memory stores instructions executable by the at least one processor, which, when executed, enable the at least one processor to perform the data acquisition method as described in any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions for causing a computer to perform the data acquisition method as described in any one of claims 1 to 8.