Malicious domain name detection method and device, electronic equipment and storage medium
By constructing an undirected weighted domain name graph and a long short-term memory network combined with a malicious domain name detection model, the problem of low accuracy and reliability of malicious domain name detection in existing technologies is solved, and efficient identification of malicious domain names is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA TELECOM CORP LTD
- Filing Date
- 2023-06-13
- Publication Date
- 2026-06-23
Smart Images

Figure CN116896454B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of cybersecurity technology, and in particular to methods, apparatus, electronic devices, and computer-readable storage media for detecting malicious domain names. Background Technology
[0002] With the rapid development of internet science and technology, daily life is increasingly intertwined with the internet. While this rapid development brings convenience, it also presents various challenges to cybersecurity. Malicious online behavior is becoming increasingly serious, generally categorized as follows: spreading malware, assisting servers in communication, sending spam, fraud, and phishing. There are various methods and approaches for detecting malicious behavior, among which malicious domain name detection is a relatively common method for improving cybersecurity.
[0003] Existing malicious domain detection methods mainly include blacklist detection and traffic detection. Blacklist detection determines whether a domain is malicious by comparing it one-to-one with a blacklist. Traffic detection extracts relevant features from domain name resolution response information and uses machine learning methods for domain classification and identification. As can be seen from existing malicious domain detection methods, the accuracy of blacklist detection relies heavily on the blacklist itself. However, with the rapid development of network technology and the rapid growth of data, blacklists cannot always comprehensively cover active malicious domains. Blacklist databases require real-time updates and maintenance, which is costly. Furthermore, the growth rate of malicious domains far exceeds the update rate of the blacklist, resulting in incomplete detection results and reduced accuracy. On the other hand, malicious domain detection based on local traffic features is vulnerable to modification or bypassing by attackers, leading to reduced reliability of the detection results.
[0004] It is evident that existing methods for detecting malicious domain names still need improvement. Summary of the Invention
[0005] This application provides a method, apparatus, and electronic device for detecting malicious domain names, which can solve or improve the problems of low accuracy and / or low reliability in detecting malicious domain names.
[0006] Firstly, embodiments of this application disclose a method for detecting malicious domain names, including:
[0007] Based on domain association data, obtain the given malicious domain and the domain to be detected, as well as the preset domain characteristics of the given malicious domain and the domain to be detected;
[0008] Based on preset domain characteristics and a given malicious domain, obtain the maliciousness score of the domain to be detected, wherein the maliciousness score is used to represent the degree of association between the domain to be detected and the given malicious domain;
[0009] The malicious score and the preset domain name features are input into a pre-trained long short-term memory network to obtain the hidden layer features output by the long short-term memory network after feature encoding of the current input;
[0010] The malicious score, the preset domain name features, and the hidden layer features are input into a pre-trained malicious domain name detection model to obtain the domain name detection results of the domain name to be detected.
[0011] Secondly, embodiments of this application disclose a malicious domain name detection device, comprising:
[0012] The domain name and domain name feature acquisition module is used to acquire a given malicious domain name and a domain name to be detected, as well as the preset domain name features of the given malicious domain name and the domain name to be detected, based on the domain name association data.
[0013] The malicious score acquisition module is used to acquire the malicious score of the domain to be detected based on preset domain characteristics and a given malicious domain, wherein the malicious score is used to represent the degree of association between the domain to be detected and the given malicious domain.
[0014] The hidden layer feature acquisition module is used to input the malicious score and the preset domain name features into a pre-trained long short-term memory network to obtain the hidden layer features output by the long short-term memory network after feature encoding of the current input;
[0015] The domain name detection module is used to input the malicious score, the preset domain name features and the hidden layer features into a pre-trained malicious domain name detection model to obtain the domain name detection results of the domain name to be detected.
[0016] Thirdly, embodiments of this application also disclose an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the malicious domain name detection method described in embodiments of this application.
[0017] Fourthly, embodiments of this application disclose a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, constitutes the steps of the malicious domain name detection method disclosed in embodiments of this application.
[0018] The malicious domain name detection method disclosed in this application obtains a given malicious domain name and a domain name to be detected, as well as preset domain name features for each of the given malicious domain name and the domain name to be detected, based on domain name association data. It then obtains a malicious score for the domain name to be detected based on the preset domain name features and the given malicious domain name. Next, it inputs the malicious score and the preset domain name features into a pre-trained Long Short-Term Memory (LSTM) network to obtain the hidden layer features output by the LSM network after feature encoding of the current input. Finally, it inputs the malicious score, the preset domain name features, and the hidden layer features into a pre-trained malicious domain name detection model to obtain the domain name detection result for the domain name to be detected. This method combines the feature processing advantages of two different types of models (machine learning models and deep learning models), which not only captures the features of the temporal data of domain names but also avoids the influence of sparse data, thus helping to improve the accuracy and reliability of domain name detection.
[0019] The above description is only an overview of the technical solution of this application. In order to better understand the technical means of this application and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this application more obvious and understandable, the following are specific embodiments of this application. Attached Figure Description
[0020] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0021] Figure 1 This is a flowchart of the malicious domain name detection method disclosed in the embodiments of this application;
[0022] Figure 2 This is a schematic diagram of the domain name resolution diagram disclosed in the embodiments of this application;
[0023] Figure 3 This is a schematic diagram of the undirected weighted domain name graph disclosed in the embodiments of this application;
[0024] Figure 4 This is a schematic diagram of the application architecture of the malicious domain name detection model disclosed in the embodiments of this application;
[0025] Figure 5 This is a schematic diagram of the malicious domain name detection device disclosed in the embodiments of this application;
[0026] Figure 6 A block diagram schematically illustrates an electronic device for performing the method according to this application; and
[0027] Figure 7 A storage unit for holding or carrying program code implementing the method according to this application is illustrated schematically. Detailed Implementation
[0028] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0029] The Domain Name System (DNS) is a core infrastructure of the Internet, a distributed database that primarily stores information about the mapping between domain names and IP addresses. It provides easy-to-remember mappings between domain names and complex IP addresses, driving the development of the Internet. Network users can conveniently access network resources through semantically meaningful domain names. For example, passive DNS replicates DNS messages captured between servers by sensors voluntarily deployed by users within their DNS infrastructure. These captured DNS messages are further processed and stored in a central DNS record database, which can be used for various purposes, providing a comprehensive view of the mapping between domain names and IP addresses.
[0030] With the development of network technology, an increasing number of criminals are registering malicious domain names to launch cyberattacks, causing losses to legitimate users and posing a serious threat to network security. Malicious domain name detection methods have emerged to address this need.
[0031] When using blacklist-based detection methods for malicious domain name detection, attackers can easily evade detection by introducing domain name generation algorithms. These algorithms allow attackers to periodically generate a large number of malicious domain names randomly, making them easy to pass through blacklists. Attackers only need to register a subset of these domains; once a controlled host accesses any of these registered domains, the control server can communicate with it. This makes the communication point between the control and controlled ends dynamically changing, making it difficult for security administrators and protection systems to identify malicious domains. Furthermore, when using traffic detection methods for malicious domain name detection, the local characteristics of traffic can be easily altered by attackers and are difficult to reconstruct. These issues lead to a decrease in the accuracy and reliability of malicious domain name detection.
[0032] The malicious domain name detection method disclosed in this application analyzes the global correlation between domain names using passive DNS data and combines it with historical domain name behavior data. This replaces the previous model that only focused on local features. Through correlation, a large number of new malicious domain names can be discovered using a very small set of known malicious domain names. This achieves efficient detection of both known and unknown types of malicious domain names.
[0033] like Figure 1 As shown, the malicious domain name detection method disclosed in this application includes steps 110 to 140.
[0034] Step 110: Based on the domain association data, obtain the given malicious domain name and the domain name to be detected, as well as the preset domain characteristics of the given malicious domain name and the domain name to be detected.
[0035] Optionally, the domain name association data includes one or more of the following: passive DNS data, blacklist data, domain name historical behavior data, and DGA domain name data.
[0036] Optionally, based on the domain association data, obtain the given malicious domain name and the domain name to be detected, as well as the preset domain name characteristics of the given malicious domain name and the domain name to be detected, including: sub-step 1101, sub-step 1102 and sub-step 1103.
[0037] Sub-step 1101: Obtain domain association data.
[0038] Optionally, passive DNS is a record system used to store DNS resolution data for a given location, record, and time range. Alternatively, DNS data can be downloaded from a domain registration website. Downloaded DNS data includes, but is not limited to, registered domain names and their corresponding IP addresses.
[0039] Optionally, the blacklist data includes a list of malicious domain names. The blacklist data can be dynamically updated based on the detection results of malicious domain names, or it can be updated and maintained based on manual additions by users. Optionally, the blacklist data can be downloaded from a domain name server or domain name service website.
[0040] Optionally, the domain name historical behavior data includes, but is not limited to, one or more of the following: the domain name's TTL value (Time-To-Live, which is the retention time of the domain name analysis record in the DNS server), the number of times the domain name's WHOIS (a database used to query detailed information of registered domain names) has been updated, the number of times the IP address corresponding to the domain name has changed, the number of IP address changes corresponding to the domain name, the domain names with the same IP address and their number, the domain name's creation time, the completeness of the WHOIS information, etc.
[0041] The DGA domain name data includes DGA (domain generate algori thm) domains, which are domains generated by a pseudo-random domain name generation algorithm. Optionally, a list of DGA-generated domain names can be obtained from publicly available DGA datasets.
[0042] Those skilled in the art should understand that various technologies can be used to obtain the aforementioned domain name association data, and the specific technical means for obtaining domain name association data are not limited in the embodiments of this application. It should be noted that when implementing the malicious domain name detection method disclosed in the embodiments of this application, the domain name association data includes at least: passive DNS data, blacklist data, and domain name historical behavior data.
[0043] Sub-step 1102: Perform data preprocessing on the domain name association data to obtain preprocessed data.
[0044] The data preprocessing includes at least the following: deleting passive DNS data and historical domain behavior data corresponding to high-traffic domains.
[0045] The domain association data downloaded from websites or domain name systems includes a massive amount of domain names and their historical records. This data needs to be preprocessed to remove interference or invalid data in order to improve the accuracy of domain name detection.
[0046] Optionally, the domain association data can be preprocessed to obtain preprocessed data, including performing one or more of the following data preprocessing operations: outlier handling, missing value handling, etc.
[0047] For example, when handling outliers in domain association data, this includes deleting passive DNS data and historical domain behavior data corresponding to high-traffic domains. Optionally, the popular domain data can be domains ranking by a predetermined percentage (e.g., 10%) based on DNS query counts. Specifically, domains such as baidu.com and hao123.com, while having very high DNS query counts, are mostly benign. If these domains were used for malicious attacks, ISPs (Internet Service Providers) could easily identify and block them. Deleting these popular domains will not affect the correctness and effectiveness of the system.
[0048] For example, when handling missing values in domain association data, this includes deleting domains that have no historical data.
[0049] After the aforementioned data preprocessing operations, preprocessed data is obtained.
[0050] Sub-step 1103: Based on the preprocessed data, obtain the given malicious domain name and the domain name to be detected, as well as the preset domain name features of the given malicious domain name and the preset domain name features respectively.
[0051] Optionally, based on preprocessed data, the system obtains a given malicious domain name and a domain name to be detected, as well as preset domain name features for each of the given malicious domain name and preset domain name features. This includes: obtaining a domain name set based on preprocessed data obtained after preprocessing passive DNS data, wherein the domain name set includes a domain name to be detected; obtaining a given malicious domain name based on preprocessed data obtained after preprocessing blacklist data and DGA domain name data; and obtaining preset domain name features for each domain name to be detected and preset domain name features for each given malicious domain name based on preprocessed data obtained after preprocessing historical domain name behavior data, as well as the obtained domain name to be detected and given malicious domain name.
[0052] Optionally, the preset domain name characteristics include, but are not limited to, one or more of the following characteristics: domain name, IP address corresponding to the domain name, TTL value of the domain name, number of WHOIS updates of the domain name, number of changes to the IP address corresponding to the domain name, number of changes to the IP address corresponding to the domain name, domain names with the same IP address, number of domain names with the same IP address, creation time of the domain name, and completeness of the WHOIS of the domain name.
[0053] For example, domain characteristics such as domain name, corresponding IP address, domains with the same IP address, and number of domains with the same IP address can be extracted from the preprocessed data obtained after preprocessing passive DNS data.
[0054] For example, a set of given malicious domains can be obtained from preprocessed data obtained after preprocessing blacklist data and DGA domain data, and domain characteristics such as domain name, corresponding IP address, domains with the same IP address, and number of domains with the same IP address can be extracted. The given malicious domains include domains in the blacklist data and domains in the DGA data list.
[0055] For example, the historical behavior characteristics of each domain can be extracted from the preprocessed data obtained after preprocessing the historical behavior data of the domain. These historical behavior characteristics include, but are not limited to, one or more of the following: the domain's TTL value, the number of times the domain's WHOIS has been updated, the number of times the IP address corresponding to the domain has changed, the number of IP address changes corresponding to the domain, the number of domains with the same IP address, the domain's creation time, and the completeness of the domain's WHOIS.
[0056] Optionally, from the preprocessed data obtained after preprocessing the domain historical behavior data, the domain historical behavior features of each domain are extracted, including: the domain historical behavior features of each given malicious domain, and / or, the domain historical behavior features of each domain to be detected.
[0057] Step 120: Obtain the malicious score of the domain name to be detected based on the preset domain name characteristics and the given malicious domain name.
[0058] The malicious score is used to represent the degree of association between the domain name to be detected and a given malicious domain name.
[0059] In the embodiments of this application, a domain name resolution graph is established based on preset domain name characteristics, and the maliciousness score of the domain name to be detected and the given malicious domain name is calculated.
[0060] Optionally, the step of obtaining the malicious score of the domain name to be detected based on the preset domain name characteristics and the given malicious domain name includes: sub-step 1201, sub-step 1202 and sub-step 1203.
[0061] Sub-step 1201: Based on the domain name and IP address resolution mapping relationship expressed by the preset domain name features, construct edges to obtain a domain name resolution graph, wherein the edges are used to connect domain names and IP addresses that have a resolution mapping relationship.
[0062] Optionally, the preset domain name characteristics include: the resolution mapping relationship between domain names and IP addresses, such as the IP address corresponding to a domain name and the domain name that resolves to a specified IP address. By analyzing the preset domain name characteristics of each domain name, it is possible to determine which IP address(s) each domain name can resolve to, and similarly, which domain names(s) can resolve to a specific IP address.
[0063] Taking a domain name resolution graph represented by T(D,I,E) as an example, where D represents the domain name, I represents the IP address, and E represents the edge. Given a domain name d, IP(d) is represented as the set of IPs resolved from domain name d. If a domain name d... i It is resolved to an IP address (ip) i IP address i =IP(d i If ), then there exists a corresponding edge {D} in the domain name resolution graph. i ,I i}∈E. Similarly, let D(ip) represent the set of domain names d that resolve to IP addresses.
[0064] A domain name resolution graph describes the mapping relationship between IP addresses and domain names in passive DNS data. For example... Figure 2 The domain name resolution diagram shown is an example, where circles represent domain names and squares represent IP addresses. Figure 2The DNS resolution graph in the document describes the resolution mapping relationship between nodes and IP addresses, which may include: domain name 211 and domain name 212 can resolve to IP address 221; domain name 211 and domain name 213 can resolve to IP address 222; domain name 213 and domain name 214 can resolve to IP address 223; and domain name 215 and domain name 216 can resolve to IP address 224. Figure 2 The domain name resolution graph described in the diagram can also include the following: domain names that resolve to IP address 221 include domain name 211 and domain name 212; domain names that resolve to IP address 222 include domain name 211 and domain name 213; domain names that resolve to IP address 223 include domain name 213 and domain name 214; and domain names that resolve to IP address 224 include domain name 215 and domain name 216.
[0065] Sub-step 1202: Based on the domain name resolution graph, construct an undirected weighted domain name graph, wherein the edges in the undirected weighted domain name graph are used to connect pairs of domain names that resolve to a common IP address greater than 1, and the weight of the edges in the undirected weighted domain name graph is determined based on the two domain names connected by the edge and the number of common IP addresses resolved to the two domain names.
[0066] Next, based on the domain name resolution graph T(D,I,E), an undirected weighted domain name graph DT(D,E) is constructed. Here, D represents a set of domain names, which can be selected from a set of domain names in the domain name resolution graph; E represents an edge connecting any two domain names. When constructing the undirected weighted domain name graph DT(D,E), if the number of common IP addresses resolved to domain names d1 and d2 is greater than 1, i.e., when ip(d1)∩ip(d2) is not empty, then an edge e={d1,d2} is constructed in the undirected weighted domain name graph DT to connect domain names d1 and d2, where e={d1,d2}∈E.
[0067] Next, weights are assigned to the edges of the undirected weighted domain name graph to represent the degree of association between the two domain names connected by the corresponding edge.
[0068] Optionally, the weight of an edge in the undirected weighted domain name graph is determined as follows: if the two domain names connected by the edge are the same, the weight of the edge is determined to be 1; if the two domain names connected by the edge are different, the weight of the edge is determined to be a value that is positively correlated with the number of common IP addresses resolved by the two domain names connected by the edge, and less than 1. That is, the same domain names have the highest correlation, and for different domain names, the more common IP addresses resolved, the greater the correlation.
[0069] For example, the weight of an edge in an undirected weighted domain name graph can be calculated using the following formula:
[0070]
[0071] Where, d i ,d j This represents an edge in an undirected weighted domain name graph connecting two domain names, ip(d i )∩ip(d j ) indicates the domain name d i and d j The number of common IP addresses obtained through resolution, W{d i ,d j} represents the edge {d i ,d j The weight of}.
[0072] Those skilled in the art should understand that the above formula is merely one formula that can be used to implement this application, and not the only formula, and should not be regarded as a limitation on the specific implementation of setting the weights of edges in the undirected weighted domain name graph in this application. Based on the above ideas of this application, those skilled in the art can also use other formulas to set the weights of edges in the undirected weighted domain name graph, which will not be listed one by one in the embodiments of this application.
[0073] Sub-step 1203: Based on the association between the domain name to be detected and the given malicious domain name described in the undirected weighted domain name graph, obtain the malicious score of the domain name to be detected.
[0074] In some embodiments of this application, the domains in the undirected weighted domain name graph include: a domain name to be detected and a given malicious domain name. Next, based on the association relationship between the domain names described in the undirected weighted domain name graph, and the association relationship between the domain name to be detected and the given malicious domain name, the maliciousness score of the domain name to be detected is obtained.
[0075] In practical applications, the association between the domain name to be detected and a given malicious domain name includes direct and indirect associations. For example... Figure 3 Taking the undirected weighted domain name graph shown as an example, domain name 212 and domain name 214 are given malicious domain names. The domain name to be detected, 211, resolves to the same IP address as a given malicious domain name, 212 (e.g., Figure 2 If the IP address 221 is used, then the domain name to be detected 211 can be considered to be directly associated with the given malicious domain name 212. In this case, there will be an edge L in the undirected weighted domain name graph. 12 The domain name to be tested, 211, and the domain name to be tested resolve to the same IP address (e.g., domain name 211 and domain name 213). Figure 2 If the IP address is 222, then the domain name to be detected, 211, is considered to be directly associated with domain name 213. In this case, there will be an edge L in the undirected weighted domain name graph. 13 The domain name to be detected, 213, resolves to the same IP address as the given malicious domain name, 214 (e.g., ...). Figure 2If the IP address is 223, then the domain name to be detected, 213, is considered to be directly associated with domain name 214. In this case, there will be an edge L in the undirected weighted domain name graph. 34 If the domain name to be detected 211 and a given malicious domain name 214 do not resolve to the same IP address, then it can be considered that the domain name to be detected 211 and the given malicious domain name 214 are indirectly related, and there is no edge directly connecting the domain name to be detected 211 and the given malicious domain name 214 in the undirected weighted domain name graph.
[0076] The maliciousness score of the domain to be detected is calculated based on all the associations between the domain to be detected and the given malicious domain.
[0077] Optionally, based on the association between the domain name to be detected and the given malicious domain name described in the undirected weighted domain name graph, the maliciousness score of the domain name to be detected is obtained, including: sub-step A, sub-step B and sub-step C.
[0078] Sub-step A: Based on the undirected weighted domain name graph, the first association relationship between the domain name to be detected and the given malicious domain name is obtained by directly connecting the edge between the domain name to be detected and the given malicious domain name, as well as the weight of the edge.
[0079] Optionally, based on the edges directly connecting the domain to be detected and the given malicious domain in the undirected weighted domain graph, and the weights of the edges, the first association between the domain to be detected and the given malicious domain is obtained, including: traversing the undirected weighted domain graph to obtain all edges directly connecting the domain to be detected and the given malicious domain; and obtaining the first association between the domain to be detected and the given malicious domain based on the weights of all the obtained edges.
[0080] Optionally, the first association relationship includes: the weights of all edges connecting the domain to be detected and the given malicious domain in the undirected weighted domain graph. For example... Figure 3 The edge L connecting the domain name to be detected 211 and the given malicious domain name 212 12 The weight.
[0081] For example, for a domain name d to be detected, let M1(d) represent the first association between the domain name d to be detected and a given malicious domain name. Then the first association M1(d) between the domain name d to be detected and the given malicious domain name can be expressed as:
[0082] M1(d)={W(H 11 ,d),...,W(H 1n ,d)};
[0083] Among them, H 1i ∈H, where H is a set of malicious domains. 1i W(H) represents a given malicious domain that is directly connected to the domain to be detected d through an edge in the undirected weighted domain name graph.1i d) indicates that the domain name to be detected d is related to the given malicious domain name H. 1i The weight of the edges between them.
[0084] Sub-step B: Based on the path formed by at least two edges indirectly connecting the domain to be detected and the given malicious domain in the undirected weighted domain graph, and the weight of the path, a second association relationship between the domain to be detected and the given malicious domain is obtained.
[0085] Optionally, based on the path formed by at least two edges indirectly connecting the domain to be detected and the given malicious domain in the undirected weighted domain graph, and the weight of the path, a second association relationship between the domain to be detected and the given malicious domain is obtained, including: traversing the undirected weighted domain graph to obtain all paths connecting the domain to be detected and the given malicious domain that are formed by at least two edges; calculating the weight corresponding to the path based on the weight of the edges constituting each path; for each given malicious domain, taking the path with the largest weight connecting the domain to be detected and the given malicious domain as the target path connecting the domain to be detected and the given malicious domain; and obtaining the second association relationship between the domain to be detected and the given malicious domain based on the weight corresponding to the target path connecting the domain to be detected and each given malicious domain.
[0086] First, by traversing the undirected weighted domain name graph, all paths connecting the domain to be detected and the given malicious domain are obtained. Then, paths containing only one edge are deleted, retaining all paths consisting of at least two edges. Paths consisting of at least two sequentially connected edges represent indirect connections between the domain to be detected and the given malicious domain. For example, Figure 3 In the middle, the edge L connecting the domain name to be detected 211 and the given malicious domain name 214 13 and L 34 A path formed by connecting parts.
[0087] For a domain name to be detected and a given malicious domain name connected by two or more edges, let L = (d1, d2, ..., d...). K ) represents the domain name to be detected d1 and the given malicious domain name d. K A path between [a certain number of points] can have its weight determined by the weights of the edges that connect each point in sequence. Optionally, the weight of the path can be obtained by multiplying the weights of the edges that make up each path. For example, it can be obtained using the formula: Calculate the weight corresponding to path L.
[0088] For each path between the domain to be detected and the given malicious domain, the weight corresponding to each path is calculated using the method described above. Let's take the domain to be detected d1 and the given malicious domain d2 as an example. K There are m paths between them, denoted as: path L1, L2, ... Lm For example, after calculation, the weights corresponding to each of the m paths can be obtained. Optionally, the correlation represented by the path with the highest weight can be selected as the correlation between the domain name d1 to be detected and the given malicious domain name d. K The second association between them. That is, for domain name d1, d K Paths L1, L2, ... L m MaxL(d1, d K ) = max(W(L l )), where 1≤l≤m.
[0089] As mentioned earlier, for a domain name d to be detected, if M2(d) represents the second association between the domain name d to be detected and a given malicious domain name, then the second association M2(d) between the domain name d to be detected and the given malicious domain name can be expressed as:
[0090] M2(d)={MaxL(H 21 ,d), …,MaxL(H 2n ,d)};
[0091] Among them, H 2j ∈H, where H is a set of malicious domains. 2j MaxL(H) represents a given malicious domain name. 2j d) indicates that the domain name to be detected d is related to the given malicious domain name H. 2j The weights of the target paths between them.
[0092] Sub-step C: Obtain the malicious score of the domain name to be detected based on the first association relationship and the second association relationship.
[0093] The union of the first association and the second association constitutes the association between the domain name to be detected and the given malicious domain name.
[0094] Optionally, the first association relationship includes: the weight of the edge connecting the domain to be detected and each identifiable malicious domain; the second association relationship includes: the weight of the target path connecting the domain to be detected and each identifiable malicious domain; the step of obtaining the malicious score of the domain to be detected based on the first association relationship and the second association relationship includes: calculating the malicious score of the domain to be detected by combining the weight of the edge in the first association relationship and the weight of the target path in the second association relationship.
[0095] Taking the aforementioned first association relationship M1(d) and second association relationship M2(d) as examples, the association relationship between the domain name to be detected and the given malicious domain name can be expressed as: M(d)=M1(d)∪M2(d).
[0096] Substituting the aforementioned first association M1(d) and second association M2(d), the association between the domain to be detected and the given malicious domain can be expressed as follows:
[0097] M(d)={W(H 11 ,d),...,W(H 1n ,d),MaxL(H 21 ,d), …,MaxL(H 2n ,d)};
[0098] Among them, W(H 11 ,d),...,W(H 1n d) represents the weight of the edges connecting the domain to be detected to each identifiable malicious domain, MaxL(H 21 ,d), …,MaxL(H 2n d) represents the weight of the target path connecting the domain to be detected and each identifiable malicious domain.
[0099] Optionally, the weights of each side in the association between the domain to be detected and the given malicious domain and the weights of each target path can be summed, and the sum of the weights can be used as the malicious score of the domain to be detected.
[0100] Optionally, the weights in the association between the domain to be detected and the given malicious domain can be implicitly sorted from largest to smallest from front to back; then, a weight negatively correlated with the sorting position is assigned to each weight, i.e., the later the sorting position, the smaller the weight; then, the weights are summed with corresponding weights. For example, the malicious score MS(d, H) of the domain to be detected can be calculated using the following formula:
[0101]
[0102] In the above formula, N represents the total number of weights in the association relationship M(d) between the domain to be detected and the given malicious domain, i.e., the total number of association relationships. W'(H1,d) represents the first weight in the sorted weight sequence, W'(H i ,d) represents the i-th weight in the sorted weight sequence, W'(H i d) may be the weight of the target path or the weight of the edge, determined based on the first and second association relationships obtained in the preceding steps.
[0103] In other embodiments of this application, other methods can be used to combine the weights of the edges in the first association relationship and the weights of the target path in the second association relationship to calculate the malicious score of the domain name to be detected. These methods will not be listed one by one in the embodiments of this application.
[0104] Step 130: Input the malicious score and the preset domain name features into a pre-trained long short-term memory network to obtain the hidden layer features output by the long short-term memory network after feature encoding of the current input.
[0105] The hidden layer features contain time information associated with the domain name to be detected.
[0106] If the malicious score is less than or equal to the preset malicious score threshold, more information needs to be obtained to detect malicious domains using a pre-trained model.
[0107] First, the malicious score and the preset domain name features obtained in the preceding steps are concatenated according to the input requirements of a pre-trained Long Short-Term Memory (LSTM) network to obtain a first concatenated feature. Then, the first concatenated feature is input into the LTM network, which encodes the input first concatenated feature and outputs the corresponding result. In this embodiment, the hidden layer features output by a specified hidden layer of the LTM network are obtained.
[0108] Specifically, the mini-batch samples X with input as time step t t For example, X t ∈R q×p Where q is the number of samples, p is the input batch, and H is the hidden state H at the previous time step. t-1 ∈R q×p The input gate I at time step t t ∈R q×p Forgotten Gate F t ∈R nq×p Output gate O t ∈R q×p The calculation formulas are as follows:
[0109] I t =σ(X) t W xi +H t-1 W hi +b i );
[0110] F t =σ(X) t W xf +H t-1 W hf +b f );
[0111] O t =σ(X) t W xo +H t-1 W ho +b o );
[0112] In the above formula, b i b f b o ∈R 1×h It is the deviation function, W xi W xf W xo W hi W hf W ho These are weight parameters.
[0113] The formula for calculating a memory unit is: C t =tanh(X) t W xc +H t-1 W hc +b c ), where W xc W hc These are weight parameters.
[0114] Optionally, the hidden layer feature H output at the t-th time step of the Long Short-Term Memory network can be selected. t This is used for subsequent domain name detection.
[0115] As mentioned earlier, the preset domain name features include the domain name's creation time. Through data mining, the inventors discovered that malicious domain names exhibit a periodic creation pattern. For example, after a malicious domain name is discovered, it is added to a blacklist. To attack the domain name resolution system, attackers will reapply for the malicious domain name, while normal domain names typically remain active. During the training phase, the Long Short-Term Memory (LSTM) network can better learn the domain name creation time feature, i.e., the domain name creation time in the ∪ function, thus carrying the domain name creation time feature in the output hidden layer features.
[0116] For example, during the training phase, the domain association data acquired for training the Long Short-Term Memory (LSTM) network and the malicious domain detection model can be preprocessed using the aforementioned method to extract preset domain features and identify malicious domains. Then, the aforementioned method is used to calculate the maliciousness score of unknown domains. Next, based on the comparison between the maliciousness score and a preset maliciousness score threshold, malicious domains are selected, resulting in a more reliable training dataset. For example, when the maliciousness score of an unknown domain is greater than the preset maliciousness score threshold, it can be considered that the unknown domain is more likely to be malicious. Next, based on the relevant features of unknown domains with maliciousness scores greater than the preset maliciousness score threshold, the LSM network and the malicious domain detection model are trained.
[0117] Optionally, the preset malicious score threshold can be determined based on the accuracy of domain name detection and historical experience.
[0118] For example, training data is constructed based on the malicious scores of several unknown domains whose malicious scores exceed a pre-set malicious score threshold and pre-defined domain characteristics. Then, the Long Short-Term Memory (LSTM) network is trained based on the constructed data, enabling the LTM network to learn the mapping relationship between domain creation time and domain type.
[0119] The Long Short-Term Memory (LSTM) network used in the embodiments of this application may adopt network structures from the prior art, and this application does not limit the specific network structure of the LTM network. The training process of the LTM network refers to the prior art, and will not be described again in the embodiments of this application.
[0120] Because the collected domain name data includes newly generated domain names and domain names with a long generation time, the historical data for new domain names is incomplete. Training a model based on domain name features extracted from this data can lead to local biases. In this embodiment, by inputting the data into a Long Short-Term Memory (LSTM) network during data training, the impact of missing samples can be minimized due to the variable length of the LTM input, while also leveraging the advantages of LTM in extracting time series features.
[0121] Step 140: Input the malicious score, the preset domain name features and the hidden layer features into the pre-trained malicious domain name detection model to obtain the domain name detection result of the domain name to be detected.
[0122] In the embodiments of this application, the following are employed: Figure 4 The architecture shown is used to detect the domain name to be tested.
[0123] like Figure 4 As shown, the maliciousness score of the domain to be detected is concatenated with preset domain features to obtain the domain features of the domain to be detected. These domain features are then input into a Long Short-Term Memory (LSTM) network to obtain the hidden features output by the LTM network. Next, the domain features and the hidden features are concatenated to obtain a second concatenated feature, which is then input into a pre-trained malicious domain detection model. The malicious domain detection model makes a classification decision based on the second concatenated feature, provides a classification score, and outputs a detection result based on the classification score, indicating whether the domain to be detected is malicious.
[0124] In the embodiments of this application, the malicious domain name detection model employs a decision tree model. For example, the malicious domain name detection model uses the LightGBM algorithm (Light Gradient Booting Machine, which uses a gradient booting framework and a decision tree-based learning algorithm) to form a weighted fusion of multiple tree models. The final score of the input data is the weighted sum of the scores from each tree model.
[0125] The malicious domain name detection model needs to be pre-trained based on training samples. The method for generating training samples for the malicious domain name detection model is as follows: Based on domain name association data, obtain a given malicious domain name and an unknown domain name, as well as preset domain name features for each of the given malicious domain name and the unknown domain name; based on the preset domain name features and the given malicious domain name, obtain the malicious score of the unknown domain name; select the preset domain name features and the malicious score that are greater than a preset malicious score threshold, and input them into a pre-trained long short-term memory network to obtain the hidden layer features output by the long short-term memory network after feature encoding of the current input; generate training samples corresponding to the malicious domain name based on the malicious score, the preset domain name features, and the hidden layer features.
[0126] Then, a malicious domain detection model was trained based on the training samples corresponding to the malicious domains.
[0127] The specific methods for obtaining the characteristics of the preset domain name and the malicious score of the unknown domain name are described above and will not be repeated here.
[0128] Based on the training samples mentioned above, a malicious domain name detection model is trained. The trained malicious domain name detection model can output the category label that matches a given test sample.
[0129] To help readers understand the training process of a malicious domain name detection model, the following examples illustrate the parameter settings and adjustment methods during the training process.
[0130] LightGBM uses decision trees as its base model, and the relevant parameters mainly include tree depth, number of leaf nodes, minimum data size for leaf nodes, and learning rate.
[0131] During the training phase of the malicious domain name detection model, a smaller tree depth, fewer leaf nodes, larger leaf node data volume, and larger iteration step size are used in the early stages. In the later stages of iteration, detailed features of the samples in terms of data metrics are gradually mined, and the complexity of the model is gradually increased, including increasing the tree depth and the number of leaf nodes, and reducing the minimum data volume of nodes. The changes of the tree model's metrics with the number of iterations and the selection of initial values are shown in the table below:
[0132] Chinese meaning initial value With the changes of iteration Chinese meaning initial value With the changes of iteration Learning rate 0.05 Decrease Number of leaf nodes 2 Increase Tree depth 2 Increase L2 regularization parameter 0.1 Decrease Minimum data size for leaves 100 reduce L1 regularization parameter 0.1 Decrease ... ... ... ...
[0133] The parameter iteration strategy is as follows: the relevant parameter Q changes in a step-decreasing manner, with the step length defined as M, the change step size as N, the upper and lower limits of the parameter defined as F, the initial value of the parameter defined as G, and the current iteration number defined as C. The relevant change formula is then defined as:
[0134] Q = G + N * (C / / M) if (Q ≥ F (decreasing case) || Q ≤ F (increasing case));
[0135] Q = F else
[0136] Let's take the depth of a decision tree as an example to illustrate the above formula:
[0137] The depth (Q) of the first decision tree is set to 2 (initial parameter value G). Every 3 trees, some parameters are adjusted (step length M), with the depth increasing by 1 each time (step size set to N). The maximum tree depth is 100 (upper and lower bounds F of the parameters). Then, the formula for the tree depth in each iteration can be expressed as:
[0138] Q = 2 + 1 * (C / / 3) if (Q ≤ 100);
[0139] Q = 100el se
[0140] That is, the depth of the first three trees is 2, the depth of the fourth to sixth trees is 3, and so on; when the depth of the tree is greater than or equal to 100, the parameters no longer change, and the depth of all decision trees thereafter is defined as 100.
[0141] The training process for the malicious domain name detection model can be found in the training method of LightGBM in existing technologies, and will not be repeated here.
[0142] The malicious domain name detection method disclosed in this application obtains a given malicious domain name and a domain name to be detected, as well as preset domain name features for each of the given malicious domain name and the domain name to be detected, based on domain name association data. Based on the preset domain name features and the given malicious domain name, a malicious score is obtained for the domain name to be detected. Then, the malicious score and the preset domain name features are input into a pre-trained Long Short-Term Memory (LSTM) network to obtain the hidden layer features output by the LSM network after feature encoding of the current input. Finally, the malicious score, the preset domain name features, and the hidden layer features are input into a pre-trained malicious domain name detection model to obtain the domain name detection result for the domain name to be detected. This method, by combining two different types of models (machine learning model and deep learning model), can not only capture the features of the temporal data of domain names but also avoid the influence of sparse data, thus helping to improve the accuracy and reliability of domain name detection.
[0143] Furthermore, this method performs global association analysis of domains based on preset domain characteristics, constructs a domain name resolution graph and an undirected weighted domain name graph, thereby obtaining a malicious score that expresses the degree of association between the domain to be detected and the malicious domain. Then, malicious domain detection is performed based on preset domain characteristics and malicious scores. Compared with traffic-based or direct blacklist comparison, the features obtained by global association analysis are more comprehensive and not easily bypassed. Therefore, the domain detection results are more accurate and reliable.
[0144] Accordingly, this application also discloses a malicious domain name detection device, such as... Figure 5 As shown, the device includes:
[0145] The domain name and domain name feature acquisition module 510 is used to acquire a given malicious domain name and a domain name to be detected, as well as the preset domain name features of the given malicious domain name and the domain name to be detected, based on the domain name association data.
[0146] The malicious score acquisition module 520 is used to acquire the malicious score of the domain to be detected based on preset domain characteristics and a given malicious domain, wherein the malicious score is used to represent the degree of association between the domain to be detected and the given malicious domain.
[0147] The hidden layer feature acquisition module 530 is used to input the malicious score and the preset domain name features into a pre-trained long short-term memory network to obtain the hidden layer features output by the long short-term memory network after feature encoding for the current input.
[0148] The domain name detection module 540 is used to input the malicious score, the preset domain name features and the hidden layer features into a pre-trained malicious domain name detection model to obtain the domain name detection result of the domain name to be detected.
[0149] Optionally, the malicious score acquisition module 520 is further configured to:
[0150] Based on the domain name and IP address resolution mapping relationship expressed by the preset domain name features, edges are constructed to obtain a domain name resolution graph, wherein the edges are used to connect domain names and IP addresses that have a resolution mapping relationship;
[0151] Based on the domain name resolution graph, an undirected weighted domain name graph is constructed, wherein the edges in the undirected weighted domain name graph are used to connect pairs of domain names that resolve to a common IP address greater than 1, and the weight of the edges in the undirected weighted domain name graph is determined based on the two domain names connected by the edge and the number of common IP addresses resolved to the two domain names.
[0152] Based on the association between the domain name to be detected and the given malicious domain name as described in the undirected weighted domain name graph, the maliciousness score of the domain name to be detected is obtained.
[0153] Optionally, the weights of the edges in the undirected weighted domain name graph are determined by the following method:
[0154] If the two domains connected by an edge are the same, then the weight of the edge is determined to be 1.
[0155] If the two domain names connected by the edge are different, the weight of the edge is determined to be a value that is positively correlated with the number of common IP addresses resolved by the two domain names connected by the edge, and less than 1.
[0156] Optionally, based on the association between the domain to be detected and the given malicious domain described in the undirected weighted domain name graph, the maliciousness score of the domain to be detected is obtained, including:
[0157] Based on the undirected weighted domain name graph, the first association between the domain name to be detected and the given malicious domain name is obtained by directly connecting the edge to the domain name to be detected and the edge weight.
[0158] Based on the undirected weighted domain name graph, the path formed by at least two edges indirectly connecting the domain name to be detected and the given malicious domain name, and the weight of the path, a second association relationship between the domain name to be detected and the given malicious domain name is obtained.
[0159] Based on the first association relationship and the second association relationship, obtain the maliciousness score of the domain name to be detected.
[0160] Optionally, obtaining the first association between the domain to be detected and the given malicious domain based on the edge directly connecting the domain to be detected and the given malicious domain in the undirected weighted domain graph, and the weight of the edge, includes:
[0161] Traverse the undirected weighted domain name graph to obtain all edges that directly connect the domain name to be detected to the given malicious domain name;
[0162] Based on the weights of all the edges obtained, the first association between the domain name to be detected and the given malicious domain name is obtained.
[0163] Optionally, based on the path formed by at least two edges indirectly connecting the domain to be detected and the given malicious domain in the undirected weighted domain graph, and the weight of the path, a second association relationship between the domain to be detected and the given malicious domain is obtained, including:
[0164] Traverse the undirected weighted domain name graph to obtain all paths that connect the domain name to be detected to the given malicious domain name and consist of at least two edges;
[0165] Calculate the weight of the path based on the weight of the edges that make up each path;
[0166] For each given malicious domain, the path with the highest weight connecting the domain to be detected and the given malicious domain is taken as the target path connecting the domain to be detected and the given malicious domain.
[0167] Based on the weights corresponding to the target paths connecting the domain to be detected and each given malicious domain, a second association relationship between the domain to be detected and the given malicious domain is obtained.
[0168] Optionally, the first association relationship includes: the weight of the edge connecting the domain to be detected and each identifiable malicious domain; the second association relationship includes: the weight of the target path connecting the domain to be detected and each identifiable malicious domain; and obtaining the maliciousness score of the domain to be detected based on the first association relationship and the second association relationship includes:
[0169] By combining the weights of the edges in the first association relationship and the weights of the target path in the second association relationship, the maliciousness score of the domain name to be detected is calculated.
[0170] Optionally, obtaining the given malicious domain name and the domain name to be detected, as well as the preset domain name characteristics of the given malicious domain name and the domain name to be detected, based on domain name association data, includes:
[0171] Obtain domain association data, which includes one or more of the following: passive DNS data, blacklist data, domain historical behavior data, and DGA domain data;
[0172] The domain association data is preprocessed to obtain preprocessed data, wherein the data preprocessing includes at least: deleting passive DNS data and domain historical behavior data corresponding to high-popularity domains;
[0173] Based on preprocessed data, obtain the given malicious domain name and the domain name to be detected, as well as the preset domain name features of the given malicious domain name and the preset domain name features respectively.
[0174] The malicious domain name detection device disclosed in this application is used to implement the malicious domain name detection method described in this application. The specific implementation methods of each module of the device will not be repeated here, but can be found in the specific implementation methods of the corresponding steps in the method embodiments.
[0175] This application discloses a malicious domain name detection device. It obtains a given malicious domain name and a domain name to be detected, along with preset domain name features for each, based on domain name association data. Based on the preset domain name features and the given malicious domain name, it obtains a malicious score for the domain name to be detected. Then, it inputs the malicious score and the preset domain name features into a pre-trained Long Short-Term Memory (LSTM) network to obtain hidden features output by the LTM network after feature encoding of the current input. Finally, it inputs the malicious score, the preset domain name features, and the hidden features into a pre-trained malicious domain name detection model to obtain the domain name detection result. This device, by combining two different types of models (machine learning model and deep learning model), can not only capture the features of the temporal data of domain names but also avoid the influence of sparse data, thus helping to improve the accuracy and reliability of domain name detection.
[0176] Furthermore, this device performs global association analysis of domain names based on preset domain name features, constructs a domain name resolution graph and an undirected weighted domain name graph, thereby obtaining a malicious score that expresses the degree of association between the domain name to be detected and the malicious domain name. Then, malicious domain name detection is performed based on preset domain name features and malicious score. Compared with traffic-based or direct blacklist comparison, the features obtained by global association analysis are more comprehensive and not easily bypassed. Therefore, the domain name detection results are more accurate and reliable.
[0177] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus embodiments, since they are fundamentally similar to the method embodiments, the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.
[0178] The above provides a detailed description of a malicious domain name detection method and apparatus provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method of this application and its core idea. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the idea of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
[0179] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0180] The various component embodiments of this application can be implemented in hardware, or as software modules running on one or more processors, or a combination thereof. Those skilled in the art will understand that microprocessors or digital signal processors (DSPs) can be used in practice to implement some or all of the functions of some or all of the components in the electronic device according to the embodiments of this application. This application can also be implemented as a device or apparatus program (e.g., a computer program and computer program product) for performing part or all of the methods described herein. Such a program implementing this application can be stored on a computer-readable medium, or can be in the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
[0181] For example, Figure 6 An electronic device is shown that can implement the methods according to this application. The electronic device may be a PC, mobile terminal, personal digital assistant, tablet computer, etc. The electronic device conventionally includes a processor 610 and a memory 620, and program code 630 stored on the memory 620 and executable on the processor 610, which, when executing the program code 630, implements the methods described in the above embodiments. The memory 620 may be a computer program product or a computer-readable medium. The memory 620 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM, hard disk, or ROM. The memory 620 has a storage space 6201 for the program code 630 of a computer program for performing any of the method steps described above. For example, the storage space 6201 for the program code 630 may include various computer programs for implementing the various steps in the methods described above. The program code 630 is computer-readable code. These computer programs can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, CDs, memory cards, or floppy disks. The computer program includes computer-readable code that, when executed on an electronic device, causes the electronic device to perform the method according to the above embodiments.
[0182] This application also discloses a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the malicious domain name detection method as described in this application.
[0183] Such a computer program product can be a computer-readable storage medium, which can have the same characteristics as... Figure 6 The memory 620 in the illustrated electronic device is similarly arranged with storage segments, storage spaces, etc. Program code can be stored, for example, in a compressed form on the computer-readable storage medium. The computer-readable storage medium is typically as shown in the reference... Figure 7 The portable or fixed storage unit is described above. Typically, the storage unit includes computer-readable code 630', which is code read by a processor and, when executed by the processor, implements the various steps of the method described above.
[0184] The terms "an embodiment," "embodiment," or "one or more embodiments" as used herein mean that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of this application. Furthermore, please note that the examples of the phrase "in one embodiment" do not necessarily all refer to the same embodiment.
[0185] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of this application may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this specification.
[0186] In the claims, any reference signs placed between parentheses should not be construed as limiting the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. This application can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, and third, etc., does not indicate any order. These words can be interpreted as names.
[0187] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A method for detecting malicious domain names, characterized in that, The method includes: Based on domain association data, obtain the given malicious domain and the domain to be detected, as well as the preset domain characteristics of the given malicious domain and the domain to be detected; Based on preset domain characteristics and a given malicious domain, obtain the maliciousness score of the domain to be detected; The malicious score and the preset domain name features are input into a pre-trained long short-term memory network to obtain the hidden layer features output by the long short-term memory network after feature encoding of the current input; The malicious score, the preset domain name features, and the hidden layer features are input into a pre-trained malicious domain name detection model to obtain the domain name detection result of the domain name to be detected. The step of obtaining the maliciousness score of the domain name to be detected based on preset domain name characteristics and a given malicious domain name includes: Based on the domain name and IP address resolution mapping relationship expressed by the preset domain name features, edges are constructed to obtain a domain name resolution graph, wherein the edges are used to connect domain names and IP addresses that have a resolution mapping relationship; Based on the domain name resolution graph, an undirected weighted domain name graph is constructed. The edges in the undirected weighted domain name graph are used to connect pairs of domain names whose resolved IP addresses are greater than one. The weight of each edge in the undirected weighted domain name graph is determined using the following method based on the two domain names connected by the edge and the number of shared IP addresses resolved by those two domain names: if the two domain names connected by the edge are the same, the weight of the edge is determined to be 1; if the two domain names connected by the edge are different, the weight of the edge is determined to be a value positively correlated with the number of shared IP addresses resolved by the two domain names connected by the edge and less than 1. Based on the undirected weighted domain name graph, the first association between the domain name to be detected and the given malicious domain name is obtained by directly connecting the edge to the domain name to be detected and the edge weight. Based on the undirected weighted domain name graph, the path formed by at least two edges indirectly connecting the domain name to be detected and the given malicious domain name, and the weight of the path, a second association relationship between the domain name to be detected and the given malicious domain name is obtained. Based on the first association relationship and the second association relationship, obtain the maliciousness score of the domain name to be detected.
2. The method according to claim 1, characterized in that, The step of obtaining the first association between the domain to be detected and the given malicious domain based on the edge directly connecting the domain to be detected and the given malicious domain in the undirected weighted domain graph, and the weight of the edge, includes: Traverse the undirected weighted domain name graph to obtain all edges that directly connect the domain name to be detected to the given malicious domain name; Based on the weights of all the edges obtained, the first association between the domain name to be detected and the given malicious domain name is obtained.
3. The method according to claim 1, characterized in that, Based on the undirected weighted domain name graph, the path formed by at least two edges indirectly connecting the domain name to be detected and the given malicious domain name, and the weight of the path, a second association relationship between the domain name to be detected and the given malicious domain name is obtained, including: Traverse the undirected weighted domain name graph to obtain all paths that connect the domain name to be detected to the given malicious domain name and consist of at least two edges; Calculate the weight of the path based on the weight of the edges that make up each path; For each given malicious domain, the path with the highest weight connecting the domain to be detected and the given malicious domain is taken as the target path connecting the domain to be detected and the given malicious domain. Based on the weights corresponding to the target paths connecting the domain to be detected and each given malicious domain, a second association relationship between the domain to be detected and the given malicious domain is obtained.
4. The method according to claim 1, characterized in that, The first association relationship includes: the weight of the edges connecting the domain to be detected and each identifiable malicious domain; the second association relationship includes: the weight of the target paths connecting the domain to be detected and each identifiable malicious domain; obtaining the maliciousness score of the domain to be detected based on the first association relationship and the second association relationship includes: By combining the weights of the edges in the first association relationship and the weights of the target path in the second association relationship, the maliciousness score of the domain name to be detected is calculated.
5. The method according to claim 1, characterized in that, The step of obtaining a given malicious domain name and a domain name to be detected, as well as preset domain name characteristics for each of the given malicious domain name and the domain name to be detected, based on domain name association data, includes: Obtain domain association data, which includes one or more of the following: passive DNS data, blacklist data, domain historical behavior data, and DGA domain data; The domain association data is preprocessed to obtain preprocessed data, wherein the data preprocessing includes at least: deleting passive DNS data and domain historical behavior data corresponding to high-popularity domains; Based on preprocessed data, obtain the given malicious domain name and the domain name to be detected, as well as the preset domain name features of the given malicious domain name and the preset domain name features respectively.
6. A malicious domain name detection device, characterized in that, The device includes: The domain name and domain name feature acquisition module is used to acquire a given malicious domain name and a domain name to be detected, as well as the preset domain name features of the given malicious domain name and the domain name to be detected, based on the domain name association data. The malicious score acquisition module is used to obtain the malicious score of the domain to be detected based on preset domain characteristics and a given malicious domain. The hidden layer feature acquisition module is used to input the malicious score and the preset domain name features into a pre-trained long short-term memory network to obtain the hidden layer features output by the long short-term memory network after feature encoding of the current input; The domain name detection module is used to input the maliciousness score, the preset domain name features and the hidden layer features into a pre-trained malicious domain name detection model to obtain the domain name detection result of the domain name to be detected; The malicious score acquisition module is further configured to: construct edges based on the resolution mapping relationship between domain names and IP addresses expressed by preset domain name features to obtain a domain name resolution graph, wherein the edges are used to connect domain names and IP addresses that have a resolution mapping relationship; Based on the domain name resolution graph, an undirected weighted domain name graph is constructed. The edges in the undirected weighted domain name graph are used to connect pairs of domain names whose resolved IP addresses are greater than one. The weight of each edge in the undirected weighted domain name graph is determined using the following method based on the two domain names connected by the edge and the number of shared IP addresses resolved by those two domain names: if the two domain names connected by the edge are the same, the weight of the edge is determined to be 1; if the two domain names connected by the edge are different, the weight of the edge is determined to be a value positively correlated with the number of shared IP addresses resolved by the two domain names connected by the edge and less than 1. Based on the domain name and IP address resolution mapping relationship expressed by the preset domain name features, edges are constructed to obtain a domain name resolution graph, wherein the edges are used to connect domain names and IP addresses that have a resolution mapping relationship; Based on the undirected weighted domain name graph, the first association between the domain name to be detected and the given malicious domain name is obtained by directly connecting the edge to the domain name to be detected and the edge weight. Based on the undirected weighted domain name graph, the path formed by at least two edges indirectly connecting the domain name to be detected and the given malicious domain name, and the weight of the path, a second association relationship between the domain name to be detected and the given malicious domain name is obtained. Based on the first association relationship and the second association relationship, obtain the maliciousness score of the domain name to be detected.
7. An electronic device, comprising a memory, a processor, and program code stored in the memory and executable on the processor, characterized in that, When the processor executes the program code, it implements the malicious domain name detection method according to any one of claims 1 to 5.
8. A computer-readable storage medium having program code stored thereon, characterized in that, When the program code is executed by the processor, it implements the steps of the malicious domain name detection method according to any one of claims 1 to 5.