A network security posture assessment method, electronic equipment and storage medium
By combining information gain ratio and C4.5 decision tree, network security monitoring data is streamlined, and the impact of attacks is assessed layer by layer, solving the problems of low efficiency and accuracy in network security situation analysis and achieving efficient and accurate situation assessment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING INST OF ENVIRONMENTAL FEATURES
- Filing Date
- 2022-10-28
- Publication Date
- 2026-06-12
AI Technical Summary
The data for cybersecurity situation analysis is complex and massive, resulting in slow processing speeds, complex classifier calculations, and poor classification performance, which affects the efficiency and accuracy of cybersecurity situation assessment.
Information gain ratio is used for attribute reduction, and a decision tree is constructed using the C4.5 method. Network security situation information is extracted through the trained classification model, and the degree of attack impact is considered at each level to calculate the network situation value.
It improves the processing efficiency and accuracy of network security situation assessment, reduces redundant data, and provides high-precision assessment results by comprehensively considering various network security factors.
Smart Images

Figure CN115694975B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of network communication technology, and in particular to a network security situation assessment method, electronic device, and storage medium. Background Technology
[0002] With the continuous development of network technology and the frequent occurrence of cyberattacks, network security situation analysis has become increasingly complex. Currently, network security monitoring data is often too complex and massive, leading to slow processing speeds, overly complex classifier calculations, and poor classification performance when extracting network security situation data. Network security situation assessment technology is a crucial component of network security situation awareness. By comprehensively considering various network security factors and combining assessment methods to evaluate the severity and potential impact of attacks, a network situation value is calculated, clearly indicating the network's security status and providing a basis for further situation prediction. However, the efficiency and accuracy of network security situation assessment are constrained by the processing steps involved in network security situation analysis and extraction. Summary of the Invention
[0003] Addressing the issue that the complexity and sheer volume of cybersecurity situation data lead to low efficiency and accuracy in cybersecurity situation extraction and assessment, this invention provides a cybersecurity situation assessment method, electronic device, and storage medium. These methods can streamline redundant data, improve processing efficiency, and provide a hierarchical and comprehensive analysis of cybersecurity threat situations, enabling accurate assessment.
[0004] In a first aspect, embodiments of the present invention provide a network security situation assessment method, including:
[0005] Obtain network security monitoring data for a preset duration;
[0006] Based on the information gain ratio, attribute reduction is performed on the network security monitoring data to obtain a reduced set of situational elements;
[0007] Based on the reduced set of situational elements, the network security situational information is extracted by classifying the data using a trained classification model. The network security situational information includes the type of attack, the number of attacks, the corresponding service and host within a preset time period.
[0008] Determine the attack posture value based on the type of attack;
[0009] For each service, the service's status value is determined based on the number of attacks and status values of each attack corresponding to the service within a preset time period.
[0010] The weight of each service is determined based on the number of times each service is accessed within a preset time period; the number of times a service is accessed is the sum of the number of times each attack is launched.
[0011] For each host, the host's status value is determined based on the status value and weight of each service corresponding to the host within a preset time period.
[0012] The weight of each host is determined based on the number of accesses and status values of each service corresponding to the host within a preset time period.
[0013] The network status value is determined based on the status values and weights of all hosts in the network within a preset time period.
[0014] Optionally, the attribute reduction of the network security monitoring data based on the information gain ratio to obtain a reduced set of situational elements includes:
[0015] Based on the network security monitoring data, calculate the information gain rate corresponding to each attribute;
[0016] The determination is based on the information gain ratio of each attribute. If the information gain ratio of an attribute is less than 1, the attribute is reduced; otherwise, the attribute is retained.
[0017] Based on the retained attributes, a reduced set of situational elements is obtained.
[0018] Optionally, the trained classification model is a decision tree constructed based on the C4.5 method;
[0019] The classification model was trained in the following manner:
[0020] Obtain the training and test sets with classification labels;
[0021] Constructing a decision tree based on the C4.5 method;
[0022] The constructed decision tree is trained using the training set and the test set to obtain the trained decision tree, which serves as the classification model. The training is simplified using pessimistic pruning, and a penalty factor of 0.5 is added to the numerator of the error rate of each leaf node in the decision tree during calculation.
[0023] Optionally, determining the attack posture value based on the type of attack includes:
[0024] Determine the threat level of an attack based on its type;
[0025] Based on the threat level of the attack, a situation value of at least three levels is defined, with higher threat levels corresponding to higher situation values.
[0026] Optionally, the attack threat level is used to classify the situation value into at least three levels, with higher threat levels corresponding to larger situation values, including:
[0027] The threat levels of high, medium, and low attacks are represented by 3, 2, and 1, respectively, to indicate the situation value.
[0028] Optionally, the determination of the service's status value based on the number of attacks and status values of each attack corresponding to the service within a preset time period is performed using the following expression:
[0029]
[0030] in, N represents the status value of the j-th service within a preset duration t. jk (t) represents the number of attacks of the k-th type corresponding to the j-th service within a preset time period t, and p represents the total number of attack types corresponding to the j-th service. This represents the status value of the k-th attack.
[0031] Optionally, determining the weight of each service based on the number of accesses to each service within a preset time period includes:
[0032] Based on the number of accesses to each service within a preset time period, at least three levels of importance are assigned, with higher-importance services corresponding to greater importance values.
[0033] The importance values of each service corresponding to the same host are normalized to obtain the weight of each service.
[0034] Optionally, determining the weight of each host based on the number of accesses and status values of each service corresponding to the host within a preset time period includes:
[0035] For each host, the access count of each service corresponding to the host is multiplied by the corresponding status value and then summed to determine the host's importance;
[0036] The importance of all hosts in the network is normalized to obtain the weight of each host.
[0037] Secondly, embodiments of the present invention also provide an electronic device, including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, it implements the method described in any embodiment of this specification.
[0038] Thirdly, embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the methods described in any embodiment of this specification.
[0039] This invention provides a network security situation assessment method, electronic device, and storage medium. Based on information gain ratio, this invention performs attribute reduction, simplifying redundant items in network security monitoring data, reducing data volume, and thus improving processing speed. Furthermore, this invention employs a bottom-up, layer-by-layer approach to consider the impact of attacks, calculating the network situation value. This approach is not only highly efficient but also provides a comprehensive assessment result considering various network security factors, resulting in high accuracy. Attached Figure Description
[0040] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0041] Figure 1 This is a flowchart of a network security situation assessment method provided by an embodiment of the present invention. Detailed Implementation
[0042] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0043] As mentioned earlier, the complexity and sheer volume of network security monitoring data lead to slow processing speeds, complex classifier calculations, and poor classification performance. Network security situation assessment technology is a crucial component of network security situation awareness, and its efficiency and accuracy are constrained by the processing steps of network security situation analysis and extraction. Therefore, to achieve efficient and accurate network security situation assessment, it is necessary to extract the required situational elements from the complex situational information and combine them with appropriate assessment methods to ensure high accuracy while reducing the amount of data processed. In view of this, this invention provides a network security situation assessment method based on information gain ratio, attribute reduction, and hierarchical comprehensive assessment.
[0044] The following describes the specific implementation of the above concept.
[0045] Please refer to Figure 1 This invention provides a network security situation assessment method, which includes:
[0046] Step 100: Obtain network security monitoring data for a preset duration;
[0047] Step 102: Based on the information gain ratio, perform attribute reduction on the network security monitoring data to obtain a reduced set of situational elements;
[0048] Step 104: Based on the obtained reduced set of situational elements, classify them using the trained classification model to extract the corresponding network security situational information; the network security situational information includes the type of each attack, the number of attacks, the corresponding service and host within a preset time period;
[0049] Step 106: Determine the attack situation value based on the type of attack;
[0050] The attack posture value indicates the level of attack threat.
[0051] Step 108: For each service, determine the status value of the service based on the number of attacks and status values of each attack corresponding to the service within a preset time period.
[0052] Step 110: Determine the weight of each service based on the number of accesses to each service within a preset time period; the number of accesses to a service is the sum of the number of attacks corresponding to that service.
[0053] Step 112: For each host, determine the status value of the host based on the status value and weight of each service corresponding to the host within a preset time period.
[0054] Step 114: Determine the weight of each host based on the number of accesses and status values of each service corresponding to the host within a preset time period;
[0055] Step 116: Determine the network status value based on the status values and weights of all hosts in the network within a preset time period.
[0056] In this embodiment of the invention, attribute reduction of network security monitoring data based on information gain ratio can effectively simplify redundant information while retaining attributes that provide more information according to their importance. This avoids significantly reducing classification accuracy and affecting class distribution due to dimensionality reduction of the network security monitoring data. Furthermore, this invention adopts a bottom-up, layer-by-layer approach to consider the impact of attacks. The status value of the corresponding service is determined based on the number and severity of attacks, and then the status value of the corresponding host is determined by combining the number of accesses to the service. Finally, considering all hosts, the network status value is calculated. This approach is not only highly efficient but also provides an assessment result that integrates various security factors of the entire network. Even if the input network security monitoring data has undergone attribute reduction, it still achieves high accuracy.
[0057] Optionally, after step 100 and before step 102, the method further includes:
[0058] The network security monitoring data obtained in step 100 is preprocessed, including erroneous data removal, data normalization, and data type conversion, to obtain standardized network security monitoring data.
[0059] By employing the above embodiments, a series of preprocessing operations, such as data normalization and data type conversion, can be performed to remove irrelevant information from the network security monitoring data and to standardize the data format.
[0060] Optionally, step 102 includes:
[0061] Based on the network security monitoring data, calculate the information gain rate corresponding to each attribute;
[0062] The determination is based on the information gain ratio of each attribute. If the information gain ratio of an attribute is less than 1, the attribute is reduced; otherwise, the attribute is retained.
[0063] Based on the retained attributes, a reduced set of situational elements is obtained.
[0064] In machine learning, commonly used attribute selection methods include chi-square test and information gain. These methods quantify the importance of attributes before selection and reduction. Chi-square test and information gain are different approaches to quantifying attribute importance. In information gain, the measure of importance is the amount of information an attribute brings to the classification system; the more information it brings, the more important the attribute. If an attribute has a large number of distinct values, information gain will bias the selection towards that attribute, leading to overfitting. Information gain ratio improves upon this by using intrinsic information for further measurement. Intrinsic information represents the amount of information needed for each branch of the classification; the importance of an attribute decreases as its intrinsic information increases. The above embodiment uses an information gain ratio threshold of 1 as a dividing line. When the information gain ratio of an attribute is less than 1, the attribute is reduced; otherwise, it is retained. This effectively reduces redundant terms while retaining attributes more important for classification.
[0065] Optionally, the trained classification model is a decision tree constructed based on the C4.5 method.
[0066] The above embodiments use a decision tree constructed based on the C4.5 method to classify the reduced set of situation elements. When selecting features for partitioning, features with higher information gain ratios are selected from the reduced attributes. This not only has high processing efficiency, but also, compared with other classifiers, is more suitable for the reduced set of situation elements that is also based on information gain ratio as the judgment criterion, and has accuracy and stability.
[0067] Furthermore, the classification model is trained in the following manner:
[0068] Obtain a training set and a test set with classification labels, meaning that all samples in the training set and the test set have classification labels;
[0069] Constructing a decision tree based on the C4.5 method;
[0070] The constructed decision tree is trained using the training set and the test set to obtain the trained decision tree, which serves as the classification model. The training is simplified using pessimistic pruning, and a penalty factor of 0.5 is added to the numerator of the error rate of each leaf node in the decision tree during calculation.
[0071] Pessimistic pruning is a commonly used method in post-pruning. It recursively calculates the misclassification rate of the sample nodes covered by each internal node, and then compares the error rate of the node before and after pruning to decide whether to prune. For a leaf node that covers N samples with E errors, the error rate of the leaf node is (E+0.5) / N, where 0.5 is the penalty factor. Replacing the classification of a subtree with multiple leaf nodes with a single leaf node will definitely increase the misclassification rate on the training set, but not necessarily on new data. Therefore, an empirical penalty factor needs to be added to the subtree's misclassification calculation. So, for a subtree with L leaf nodes, the misclassification rate of the subtree is...
[0072]
[0073] Among them, E i N represents the error rate in the sample of the i-th leaf node. i This represents the number of samples in the i-th leaf node.
[0074] The above embodiments incorporate an empirical penalty factor into the pessimistic pruning process, which allows the decision tree to better adapt to the characteristics of network security monitoring data and improve classification accuracy. The specific construction and training process can be found in existing technologies and will not be elaborated further here.
[0075] Optionally, step 106 further includes:
[0076] Determine the threat level of an attack based on its type;
[0077] Based on the threat level of the attack, a situation value of at least three levels is defined, with higher threat levels corresponding to higher situation values.
[0078] For example, the threat level of an attack can be determined by referring to the attack classification and prioritization section of the Snort user manual. Optionally, to facilitate calculation and ensure accuracy, the threat levels of high, medium, and low attacks can be represented by 3, 2, and 1 respectively, and the corresponding threat levels can be determined according to the attack type. Table 1 shows the correspondence between some attack types, threat levels, and threat levels.
[0079] Table 1. Correspondence between attack type, threat level, and situation value
[0080] Types of attacks Threat level of attack Attack posture value DOS middle 2 Probing Low 1 R2L middle 2 U2R high 3
[0081] Optionally, the status value of the service is determined in step 108 using the following expression:
[0082]
[0083] in, N represents the status value of the j-th service within a preset duration t. jk (t) represents the number of attacks of the k-th type corresponding to the j-th service within a preset time period t, where k ranges from 1 to p, and p represents the total number of attack types corresponding to the j-th service. This represents the status value of the k-th attack, where j ranges from 1 to n, and n represents the total number of services corresponding to the i-th host.
[0084] Optionally, in step 110, the weight of each service is determined, including:
[0085] Based on the number of accesses to each service within a preset time period, at least three levels of importance are assigned, with higher-importance services corresponding to greater importance values.
[0086] The importance values of each service corresponding to the same host are normalized to obtain the weight of each service.
[0087] Furthermore, to facilitate calculation while ensuring accuracy, the importance levels of high, medium, and low can be represented by weights of 3, 2, and 1, respectively. Then, the services corresponding to the same host are normalized, as shown in the expression:
[0088]
[0089] Among them, SW j v represents the importance value of the j-th service. j Let ∑SW represent the weight of the j-th service. j This represents the summation of the importance values of all services corresponding to the same host.
[0090] For a given host, different services have different levels of importance. The logical relationships between service importance are complex. To simplify and differentiate them, the above embodiment uses the number of accesses within a defined time period to measure service importance; services with higher frequency are more important. Table 2 shows a definition of one level of importance.
[0091] Table 2 Definition of Service Weight
[0092]
[0093] Optionally, in step 112, the status value of the host is determined using the following expression:
[0094]
[0095] This represents the status value of the i-th host within a preset time period t, where n represents the total number of services corresponding to the i-th host, and v j This represents the weight of the j-th service. This represents the status value of the j-th service within a preset duration t.
[0096] Optionally, step 114 further includes:
[0097] For each host, the importance of the host is determined by multiplying the access count of each service corresponding to that host by its status value and then summing the results. The expression is as follows:
[0098]
[0099] Mr. Qi, k represents the status value of the j-th service within a preset duration t. j HW represents the number of accesses to the j-th service within a preset duration t. i This indicates the importance of the i-th host, where i ranges from 1 to m, and m represents the total number of hosts in the network.
[0100] The importance of all hosts in the network is normalized to obtain the weight of each host, expressed as:
[0101]
[0102] Among them, w i Let ∑HW represent the weight of the i-th host. i This represents the summation of the importance of all hosts in the network.
[0103] The above embodiments consider hosts that provide more high-security value services to be more important and therefore should have a higher weight. This means that the importance of high-security value services and access frequency is increased in order to accurately assess the security risks faced by the hosts.
[0104] Optionally, in step 116, the network situation value is determined, expressed as:
[0105]
[0106] Among them, R L (t) represents the network status value within a preset time period t, and m represents the total number of hosts in the network.
[0107] The resulting time series of network situation values can be used for subsequent situation prediction.
[0108] This invention also provides an electronic device, including a memory and a processor. The memory stores a computer program, and when the processor executes the computer program, it implements a network security situation assessment method according to any embodiment of this invention.
[0109] This invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to perform a network security situation assessment method according to any embodiment of this invention.
[0110] Specifically, a system or apparatus equipped with a storage medium may be provided, on which software program code implementing the functions of any of the embodiments described above is stored, and the computer (or CPU or MPU) of the system or apparatus may read and execute the program code stored in the storage medium.
[0111] In this case, the program code read from the storage medium can itself implement the function of any of the above embodiments, and therefore the program code and the storage medium storing the program code constitute part of the present invention.
[0112] Examples of storage media used to provide program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, program code can be downloaded from a server computer via a communication network.
[0113] Furthermore, it should be clear that not only can the program code read by the computer be executed, but also the operating system or other components operating on the computer can be instructed based on the program code to perform some or all of the actual operations, thereby realizing the function of any of the embodiments described above.
[0114] Furthermore, it is understood that the program code read from the storage medium is written to the memory set in the expansion board inserted into the computer or to the memory set in the expansion module connected to the computer. Then, based on the instructions of the program code, the CPU or other components installed on the expansion board or expansion module execute some and all of the actual operations, thereby realizing the function of any of the above embodiments.
[0115] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0116] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media that can store program code, such as ROM, RAM, magnetic disk, or optical disk.
[0117] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for assessing network security situation, characterized in that, include: Obtain network security monitoring data for a preset duration; Based on the information gain ratio, attribute reduction is performed on the network security monitoring data to obtain a reduced set of situational elements, including: Based on the network security monitoring data, calculate the information gain rate corresponding to each attribute; The determination is based on the information gain ratio of each attribute. If the information gain ratio of an attribute is less than 1, the attribute is reduced; otherwise, the attribute is retained. Based on the retained attributes, a reduced set of situational elements is obtained; Based on the reduced set of situational elements, the network security situational information is extracted by classifying the data using a trained classification model. The network security situational information includes the type of attack, the number of attacks, the corresponding service and host within a preset time period. Determine the attack posture value based on the type of attack; For each service, the service's status value is determined based on the number of attacks and status values of each attack corresponding to the service within a preset time period. The weight of each service is determined based on the number of accesses to each service within a preset time period, including: Based on the number of accesses to each service within a preset time period, at least three levels of importance are determined. Services with access counts of [0, 20] are considered low in importance, services with access counts of [20, 50] are considered medium in importance, and services with access counts of [50, ∞] are considered high in importance. Services with higher importance are assigned a greater importance value. The importance values of each service corresponding to the same host are normalized to obtain the weight of each service; the number of accesses to a service is the sum of the number of attacks for each corresponding attack. For each host, the host's status value is determined based on the status value and weight of each service corresponding to the host within a preset time period. Based on the number of accesses and status values of each service corresponding to a host within a preset time period, the weight of each host is determined, including: For each host, the access count of each service corresponding to the host is multiplied by the corresponding status value and then summed to determine the host's importance; The importance of all hosts in the network is normalized to obtain the weight of each host; The network status value is determined based on the status values and weights of all hosts in the network within a preset time period.
2. The method according to claim 1, characterized in that, The trained classification model is a decision tree constructed based on the C4.5 method; The classification model was trained in the following manner: Obtain the training and test sets with classification labels; Constructing a decision tree based on the C4.5 method; The constructed decision tree is trained using the training set and the test set to obtain the trained decision tree, which serves as the classification model. The training is simplified using pessimistic pruning, and a penalty factor of 0.5 is added to the numerator of the error rate of each leaf node in the decision tree during calculation.
3. The method according to claim 1, characterized in that, The determination of the attack situation value based on the attack type includes: Determine the threat level of an attack based on its type; Based on the threat level of the attack, a situation value of at least three levels is defined, with higher threat levels corresponding to higher situation values.
4. The method according to claim 3, characterized in that, The threat level of the attack is used to classify it into at least three levels of situational value, with higher threat levels corresponding to higher situational value levels, including: The threat levels of high, medium, and low attacks are represented by 3, 2, and 1, respectively, to indicate the situation value.
5. The method according to claim 4, characterized in that, The status value of the service is determined based on the number of attacks and status values of each attack corresponding to the service within a preset time period, using the following expression: in, Indicates the preset duration t Inner j The status value of each service. Indicates the preset duration t Inner j The service corresponding to the first k The number of attacks for this type of attack. p Indicates the first j The total number of attack types corresponding to each service Indicates the first k The status value of this type of attack.
6. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the method as described in any one of claims 1-5.
7. A storage medium having a computer program stored thereon, characterized in that, When the computer program is executed in the computer, it causes the computer to perform the method of any one of claims 1-5.