Apt interorganizational relationship quantitative analysis method
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2022-08-24
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies are difficult to effectively quantify the relationships between APT organizations, and their reliance on expert knowledge and shallow features makes them susceptible to confusion and unable to adapt to complex and ever-changing network environments.
A rough set theory is used to construct an APT organizational behavior pattern model. By combining the Jaccard coefficient and the common neighbor relationship coefficient, the relationships between APT organizations are quantified through automated feature selection and data processing.
It achieves accurate quantification of relationships between APT organizations, reduces reliance on expert experience, adapts to complex network environments, and improves the accuracy and efficiency of association analysis.
Smart Images

Figure CN117668825B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of threat investigation, and in particular relates to the quantitative analysis of relationships between APT organizations. Background Technology
[0002] In recent years, with the rapid development of informatization, networking, and globalization, the scale of cyberspace has expanded, network topology has become more complex, and terminals have become more diverse, leading to an increased attack surface, heightened vulnerabilities, and a growing number of security problems. At the same time, cyberattack methods have become increasingly complex and diverse. Advanced persistent threats (APTs), with their targeted, persistent, and stealthy nature, have become a significant threat to cyberspace security. APT attacks are typically controlled by well-funded and well-organized threat actors, and often conceal underlying political, economic, and military intentions, posing a major threat to the critical assets of governments and enterprises. They often enhance their stealth and evade attribution by designing complex variants, greatly hindering attack investigations.
[0003] As more and more attacks are disclosed, potential connections between some organizations are also emerging. Existing research mainly focuses on tracing malicious samples, profiling attack organizations, and identifying behavioral patterns, with little analysis of inter-organizational relationships. Researching the relationships between APT organizations is a current technical challenge in APT attack tracing, and is of great significance for attack investigation, organization profiling, and the construction of APT attack maps. How to reasonably define APT organizations and standardize relationship measurement standards is crucial.
[0004] Some security companies rely on strong teams of experts and combat experience to track advanced threats and understand the relationships between APT groups. By examining the resources used in cyberattacks, such as IP addresses, domain names, and vulnerabilities—time-sensitive, shallow characteristics—and comparing signatures and string similarities in malicious code, they can identify organizational connections. However, these methods mostly only capture shallow connections between APT groups and are easily affected by obfuscation techniques.
[0005] The prerequisite for quantitative analysis of relationships between APT organizations is the accurate characterization of APT organization characteristics and effective relationship measurement methods. Existing attacker behavior characterization typically describes attacker behavior characteristics from two perspectives: malware and security incidents, to achieve a mapping between cyberattacks and attackers. Attack target associations reflect the geopolitical nature of APT organizations to some extent, while attack behavior associations reflect organizational origin relationships. However, single-dimensional attack target characteristics or attack behavior characteristics are one-sided, only identifying specific origin or cooperative relationships, but these two relationships are often not absolutely independent. Feature fusion can overcome the limitations of single dimensions and provide a comprehensive characterization of APT organization characteristics.
[0006] Existing research on the analysis of relationships between APT organizations is limited, and most studies rely on feature sets or inference rules defined by expert knowledge for similarity calculations, which cannot adapt to complex and ever-changing network environments. Rough set theory is a well-known mathematical tool used to handle imprecise, inconsistent, and incomplete information and knowledge. It characterizes undefined sets by approximating upper and lower bounds, and can effectively extract similar attack behaviors and target features between organizations from large-scale, chaotic threat intelligence, thus enabling the quantification of relationships between APT organizations. Summary of the Invention
[0007] This invention proposes a quantitative analysis method for the relationship between APT organizations. It uses automatically generated APT organization behavior patterns to dynamically calculate the inter-organizational correlation of different APT attacks, and the relationship coefficient can effectively reflect the degree of correlation between organizations.
[0008] This invention provides a method for quantitative analysis of inter-organizational relationships in APT, comprising the following steps:
[0009] 1) Extract structured security data from open-source technical reports to generate APT organization knowledge representations that integrate attack target characteristics and attack behavior characteristics, effectively characterize the characteristics of APT organizations, and simplify attributes through automated feature selection to filter features with high resolution.
[0010] 2) Construct an APT organizational behavior pattern model based on rough set theory, calculate the rough set membership degree of the feature sequence, and divide the knowledge representation into three parts: precise domain, fuzzy domain, and irrelevant domain according to the inaccuracy of the feature sequence. Dynamically generate APT organizational behavior patterns by approximating the upper and lower bounds.
[0011] 3) The relationship between APT organizations is quantified using the behavioral pattern fuzzy domain. The Jaccard coefficient is calculated as the direct relationship coefficient between organizations. At the same time, considering the network topology and path weight, the common neighbor relationship coefficient between organizations is defined. The direct relationship coefficient and the common neighbor relationship coefficient are added to obtain the relationship similarity between APT organizations, and an APT organization relationship network is generated.
[0012] Furthermore, the APT inter-organizational relationship quantification process includes:
[0013] a) Apply the Jaccard coefficient to the fuzzy domain of APT organizational behavior, calculate the inter-organizational direct relationship coefficient DIR, and generate the APT organizational direct relationship network;
[0014] b) Calculate the path weight of a path originating from and ending at two APT organization nodes, passing through their common neighbor nodes. This weight is defined as the minimum distance along the path divided by the number of hops. The calculation method is as follows:
[0015]
[0016] Each path consists of a pair of adjacent nodes p = {<u,v>,...,<x,y>, <y,z>|u,v,x,y,z∈D}, where D is the set of nodes in the APT organization;
[0017] c) Considering that there are multiple common neighbor nodes between two APT organizations, and that there are multiple paths through these common neighbor nodes, calculate the common neighbor relationship coefficient CN(g) between the two organizations. i ,g j Defined as: using node pairs <g i ,g j > represents the beginning and end, after organization g i g j The sum of the path weights of all common neighbor nodes is calculated as follows:
[0018]
[0019] Since different nodes are located in different local networks, the calculation results are normalized.
[0020] d) The direct relationship coefficients and the common neighbor relationship coefficients are summed and normalized to obtain the correlation coefficients between APT organizations and generate an APT organization relationship network;
[0021] e) Based on the temporal characteristics of behavioral patterns, generate APT organizational relationship networks at different time intervals, observe relationship changes, and analyze evolutionary patterns.
[0022] Furthermore, the attack target data in step 1) comes from the cybersecurity incident dataset Hackmageddon, and some incidents are manually extracted from threat intelligence according to a predefined data structure.
[0023] Further, the attack target feature representation in step 1) refers to the set of sequences used to describe the characteristics of APT group attack targets, defined as:
[0024] Target=<U e At e ,{V a e |a∈At e},{I a e |a∈At e}>
[0025] Among them, U e For a set of security events, the attribute set At e =C e ∪D, conditional attribute set C eThe decision attribute set D represents the set of APT organizations, including time, target type, target industry, target geolocation, attack method, attack tool, propagation carrier, and vulnerability exploitation. a e Indicates attribute a∈At e The range of the attribute value, I a e :U e →V a e For information functions, if Then I A e (x) represents U e The attribute value of object x on property A.
[0026] Furthermore, the attack behavior data in step 1) comes from the APT group malware hashes extracted from threat intelligence, and malware behavior data is obtained from the VirusTotal platform.
[0027] Further, the attack behavior characteristics in step 1) refer to the set of sequences used to describe the attack behavior characteristics of APT groups, defined as:
[0028] Behavior=<U m At m ,{V a m |a∈At m},{I a m |a∈At m}>
[0029] Among them, U m For malware collections, attribute set At m =C m ∪D, conditional attribute set C m Including time, static characteristics, dynamic characteristics, and vulnerability characteristics, V a m Indicates attribute a∈At m The range of attribute values, I a m :U m →V a m For information functions, if Then I Am (x) represents U m The attribute value of object x on property A.
[0030] Furthermore, in step 1), considering the complex and diverse attack behaviors of APT groups, and the significant redundancy in malware behavior features collected from VirusTotal, resulting in an extremely large set of conditional attributes representing attack behavior features, a feature selection algorithm based on mutual information is adopted to simplify the conditional attribute set, measure the dependency between conditional attributes and decision attributes, and retain the feature subset that provides as much "information" as possible for decision-making. Further, in step 2), the rough set membership degree is defined as follows:
[0031]
[0032] Among them, V(g) k ) for APT organization g k The sample set is given by [x], where [x] is the equivalence class of the feature sequence x.
[0033] Furthermore, the precise domain in step 2) consists of samples with a rough set membership degree greater than or equal to 1, including feature sequences that can characterize the unique behavioral patterns of APT organizations, that is, these feature sequences can and can only represent the unique behavior of APT organizations.
[0034] Furthermore, the fuzzy domain in step 2) consists of samples with rough set membership less than 1 and greater than 0, including feature sequences that can not only be associated with one APT organization, that is, these feature sequences can not only represent a unique APT organization, but also belong to other APT organizations. When the fuzzy domain of an organization is not an empty set, it proves that it has similar behavioral patterns with other APT organizations, and there may be technical exchanges or strategic cooperation. They have homology and cooperative relationships. The similar behavioral patterns between different APT organizations are an important reference for measuring their correlation.
[0035] Furthermore, in step 2), the irrelevant domain consists of samples with a rough set membership degree less than or equal to 0, representing feature sequences that are completely unrelated to the organization, i.e., behavioral patterns that it does not possess.
[0036] The method of this invention can effectively define the behavioral patterns of APT organizations and quantify inter-organizational relationships, and has the following advantages compared with the prior art:
[0037] 1. This invention utilizes the hash of easily collected malicious samples to obtain the behavioral characteristics of malicious software. Through a feature selection algorithm, it automatically extracts high-resolution features from large-scale messy data, reducing the degree of human intervention in data collection.
[0038] 2. This invention proposes an attack behavior pattern model suitable for APT organization association analysis by combining rough set theory. It can effectively characterize the characteristics of APT organizations, support the dynamic expansion of new knowledge, and greatly reduce the dependence on expert experience and knowledge base.
[0039] 3. This invention designs a similarity calculation method based on node correlation degree, which reasonably quantifies the relationship between organizations and fills the gap in the research on the measurement standard of inter-organizational relationship in APT. Attached Figure Description
[0040] Figure 1 This is a flowchart illustrating the quantitative analysis of inter-organizational relationships in APT using the method of this invention.
[0041] Figure 2 This is a comparison of the results of relation quantification analysis performed by the method of the present invention on different APT organizational knowledge representation dimensions.
[0042] Figure 3 This is an APT organizational relationship network diagram generated by the method of the present invention.
[0043] Figure 4 This is a relational evolution diagram generated by the method of the present invention after dividing time intervals according to the temporal characteristics of behavioral patterns. Detailed Implementation
[0044] To make the above-mentioned features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to specific embodiments and accompanying drawings.
[0045] The inter-organizational relationship quantification method designed in this invention is based on the APT organizational behavior patterns defined by rough set theory. It can extract data suitable for quantifying inter-organizational relationships in APTs from large-scale threat intelligence and generate an APT organizational relationship network. The specific method flow is as follows: Figure 1 As shown, its main steps include:
[0046] Step 101: Extract key attribute features of security incidents from different APT organizations from threat intelligence, including time, target type, target industry, target geolocation, attack method, attack tool, propagation carrier, and vulnerability exploitation, to form a security incident sample set and generate attack target feature representations;
[0047] Step 102: Extract the hashes of malware related to APT groups from threat intelligence, collect the dynamic and static characteristics and vulnerability information of malware from the VirusTotal tool, construct a malware sample set, and generate attack behavior feature representations;
[0048] Step 103: Perform data preprocessing on the malware sample set, standardize the writing format, remove noise from the detection environment, and remove duplicate samples and samples that do not contain effective features after preprocessing.
[0049] Step 104: The feature selection algorithm based on mutual information simplifies the set of conditional attributes representing the attack behavior features, calculates the dependency between conditional attributes and decision attributes, and retains the subset of attributes that are more relevant to the decision.
[0050] Step 105: The attack target feature representation and the attack behavior feature representation together constitute the APT organization's knowledge representation.
[0051] Step 201: Generate APT tissue g k rough set of samples Where V(g) k ) for organization g k The calculation methods for the upper and lower approximation operators for the sample set are as follows:
[0052]
[0053] apr (V(g k ))={x∈U|[x]∈V(g k )};
[0054] Step 202: Based on the degree of inaccuracy of the feature sequences in different APT organization sample sets, the knowledge representation of APT organizations is divided into three parts: precise domain, fuzzy domain, and irrelevant domain, and an APT organization behavior pattern model Pattern(g) is constructed. k )=<Prec(g k ),Fuzz(g k ),Irr(g k The calculation methods for the precise domain, fuzzy domain, and irrelevant domain are as follows:
[0055] Prec(g k )= apr (V(g k ))
[0056]
[0057]
[0058] Step 301: Apply Jaccard coefficients to the fuzzy domain of APT organizational behavior, calculate the inter-organizational direct relationship coefficients (DIR), and generate the APT organizational direct relationship network.
[0059] Step 302: Considering the scale and topology of the APT organization relationship network, and to avoid excessive computation time cost, 1-hop neighbor nodes are selected to generate a connected sub-network graph with a diameter of 3, and the common neighbor relationship coefficient CN of each pair of APT organizations is calculated.
[0060] Step 303: Add the direct relationship coefficient and the common neighbor relationship coefficient to obtain the correlation coefficient between APT organizations and generate the APT organization relationship network;
[0061] Step 304: Based on the temporal characteristics of behavioral patterns, generate APT organizational relationship networks at different time intervals, observe relationship changes, and analyze evolutionary patterns.
[0062] This invention has been validated on organizational relationship datasets collected from publicly available APT analysis reports. Figure 2 The results of relation quantification analysis of the method of this invention in different APT organization knowledge representation dimensions can be seen. It can be seen that the accuracy of the proposed fusion feature association and the number of association relationships obtained are much higher than those of the single feature dimension. Figure 3 The APT organizational relationship network generated by the method of this invention shows that the correlation within related clusters is relatively high, while the correlation between clusters is relatively low. Related organizational clusters tend to have more connections within each other, and the correlation coefficient can effectively reflect the strength of the relationship between organizations. Figure 4 The relationship evolution diagram generated by the method of this invention after dividing time intervals according to the time characteristics of behavioral patterns can be observed to show that APT attacks show a development trend from origin to conflict and chaos and then to gradual stabilization, which is in line with the evolution law.
[0063] The above description is merely a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for quantitative analysis of inter-organizational relationships in APT, characterized in that, include: A. APT Organization Knowledge Representation: Structured security data samples are extracted from open-source threat intelligence to generate an APT organization knowledge representation. This refers to the representation of attack target characteristics and attack behavior characteristics used to describe the behavioral patterns of APT organizations, and further includes the following steps: A1. Conduct preliminary screening and processing of APT attack data collected from open-source technology reports, remove redundant and invalid samples, extract key security events and malware features, and form structured data. A2. Attack Target Feature Representation refers to a set of sequences used to describe the characteristics of attack targets of APT organizations. Each feature sequence is described by a set of security event attributes, defined as: in For a collection of security events, a set of attributes Conditional attribute set This includes time, target type, target industry, target geographic location, attack method, attack tools, propagation vector, exploited vulnerabilities, and decision attribute set. Including the APT group to which the security incident belongs, Represents attributes The range of attribute values, For information functions, if ,but express medium object In attributes The attribute value on; A3. Attack behavior signature refers to a set of sequences used to describe the attack behavior characteristics of APT groups. Each signature sequence is described by a set of malware attributes and is defined as follows: in A collection of malware attributes Conditional attribute set Including time, static features, dynamic features, and vulnerability features. Represents attributes The range of attribute values, For information functions, if ,but express medium object In attributes The attribute value on; A4. Simplify the attack behavior feature representation condition attribute set generated in step A3, and select a feature subset containing more decision information based on the mutual information values of different features. B. Construction of APT Organization Behavior Patterns: Applying rough set operators to define APT organizational behavior, dividing the knowledge representation into three parts: precise domain, fuzzy domain, and irrelevant domain, and dynamically generating APT organizational behavior patterns through upper and lower bound approximation. C. Organizational Relationship Measurement: Utilizing behavioral fuzzy domains, we design an organizational relationship measurement method based on node correlation to calculate the relationships between different APT organizations, construct an APT organizational relationship network, and analyze the evolutionary patterns of relationships between organizations.
2. The method for quantitative analysis of inter-organizational relationships in APT according to claim 1, characterized in that, Step B further includes the following steps: B1. Using rough set operators to partition the APT organization sample set, expressed as: in, For APT organization The sample set, The lower approximation is used, and the lower approximation set includes the APT organization. Its unique characteristic sequence The above approximation includes those that can be associated with APT organizations. The set of all feature sequences, Let x be the equivalence class of the feature sequence x; B2. Calculating Elements rough set membership degree That is, sample The inaccuracy coefficient belonging to the APT organization sample set; B3. Based on the rough set membership degrees calculated in step B2, define the subset consisting of samples with a membership degree greater than or equal to 1 as the precise domain. A subset of samples with membership degrees greater than 0 and less than 1 is defined as a fuzzy domain. A subset of samples with a membership degree less than or equal to 0 is defined as the irrelevance region. The APT organizational behavior model is composed of these three subdomains, as shown below: 。 3. The method for quantitative analysis of inter-organizational relationships in APT according to claim 1, characterized in that, Step C further includes the following steps: C1. Apply the Jaccard coefficient to the fuzzy domain of APT organizational behavior, calculate the direct relationship coefficients between organizations, and generate the direct relationship network of APT organizations. C2. Based on the inter-organizational relationship distance calculated in step C1, the weight of the undirected path that passes through multiple nodes is defined as the minimum distance in the path divided by the number of hops in the path. C3. Considering the direct relationship network topology and path weights of APT organizations, introduce a common neighbor relationship coefficient. As another indicator for quantifying inter-organizational relationships, it is defined as: based on node pairs As the beginning and the end, after organization , The sum of all path weights of the common neighbor nodes, and the result is normalized. C4. The direct relationship coefficients obtained in step C1 and the common neighbor relationship coefficients obtained in step C3 are summed and normalized to obtain the APT inter-organizational relationship coefficients and generate the APT organizational relationship network. C5. Based on the temporal characteristics of behavioral patterns, generate APT organizational relationship networks at different time intervals, observe relationship changes, and analyze evolutionary patterns.