A malicious domain name real-time identification method, device and system based on fingerprint correlation

By establishing an interception fingerprint database on the interception device and calculating the multi-dimensional feature similarity of domain names, the problems of delay and insufficient accuracy in intercepting malicious domain names are solved, enabling real-time identification and interception of malicious redirecting domain names and improving the protection effect.

CN122247737APending Publication Date: 2026-06-19NANJING SINOVATIO TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING SINOVATIO TECHNOLOGY CO LTD
Filing Date
2026-04-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing malicious domain name blocking solutions cannot identify redirected domain names in real time, resulting in problems such as blocking delays and insufficient identification accuracy, making it difficult to adapt to the dynamic changes of malicious domain names.

Method used

By establishing an interception fingerprint database on the interception device, extracting client fingerprint features and calculating the multi-dimensional feature similarity of domain names, real-time identification and interception of malicious redirection domain names can be achieved, including the extraction of client IP, protocol type, TLS fingerprint, and HTTP request header structure fingerprint, as well as the calculation of multi-dimensional feature similarity.

Benefits of technology

It achieves millisecond-level real-time identification and interception of malicious redirecting domains, improving identification accuracy, reducing false alarm rate, raising the technical threshold for attackers, and forming an asymmetric advantage in attack and defense.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247737A_ABST
    Figure CN122247737A_ABST
Patent Text Reader

Abstract

This invention discloses a method, device, and system for real-time identification of malicious domain names based on fingerprint association. The method includes: acquiring a first access request; if the first domain name matches a malicious domain name database, intercepting the first access request, extracting the fingerprint features of a first client, and storing the association between the first client fingerprint features and the first domain name in an interception fingerprint database; acquiring a second access request; if the second domain name does not match a malicious domain name database, extracting the fingerprint features of a second client, and querying the interception fingerprint database using the second client fingerprint features; if the first client fingerprint features are matched, acquiring the associated stored first domain name, and calculating the multi-dimensional feature similarity score between the second domain name and the first domain name; if the score exceeds a preset threshold, determining the second domain name as a malicious redirect domain name and intercepting it in real time. This invention achieves immediate identification of the same attacker changing domain names through fingerprint association, blocking without waiting for the domain name to be entered into the database, and improving the accuracy of identifying dynamic malicious domain names.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of network security and traffic interception technology, specifically to a method, device, and system for real-time identification of malicious domain names based on fingerprint association. Background Technology

[0002] Currently, the mainstream technical solution for malicious domain name interception is a collaborative architecture of "analysis device - interception device": the analysis device maintains a malicious domain name database and sends it to the interception device in real time. After the interception device receives the raw traffic, it extracts the request domain name and matches it with the malicious domain name database. If the match is successful, the malicious request is intercepted; otherwise, the traffic is allowed to pass.

[0003] However, to bypass the aforementioned static interception strategy, cyber attackers have pre-set an automatic failover mechanism: when malicious domain A is blocked, the client will automatically redirect to an unlisted malicious domain B. If domain B is blocked, it will then redirect to domain C, domain D, and so on. For this redirection escape scenario, traditional solutions rely on malware analysis equipment to discover new redirect domains (such as domain B) through other means. After analysis confirms its malicious nature, it is added to the malicious domain database and sent to the interception equipment to achieve the interception of the new malicious domain.

[0004] The limitations of traditional solutions lie in their passivity and lag: Response delay: From the moment domain B becomes active until it is captured, analyzed, confirmed, and added to the blacklist by the security system, there is a significant time window. Attackers or criminals can carry out malicious acts within this window. Constant scrambling: Attackers or criminals can prepare a massive number of backup domains (B, C, D…). The security system's blocking always lags behind the domain switching, forming a passive cycle of "blocking A -> redirecting to B -> blocking B -> redirecting to C…", resulting in limited protective effectiveness.

[0005] Current solutions improve the monitoring and collection capabilities of malicious domains by optimizing the identification model on the analysis equipment and pushing the detection results to the interception system for processing. While this approach enhances the analysis equipment's ability to identify malicious domains, it fails to address the practical problems of "delayed response and constant workload." There is an urgent need for a technical solution that can achieve real-time identification of malicious domains based on fingerprint association. Summary of the Invention

[0006] Purpose of the invention: In order to solve the technical problems of existing malicious domain name blocking schemes that cannot identify redirecting domains in real time, have blocking delays, and are difficult to adapt to the dynamic changes of malicious domains, resulting in insufficient identification accuracy, this invention provides a fingerprint-based real-time identification device and system for malicious domains. It can accurately identify and block new redirecting domains locally at the blocking node before they are included in the malicious domain name database.

[0007] Technical solution: A method for real-time identification of malicious domain names based on fingerprint association, comprising the following steps:

[0008] Obtain the first access request, extract the first domain name, and if the first domain name matches the malicious domain name database, intercept the first access request, extract the fingerprint feature of the first client that initiated the first access request, and store the association between the first client fingerprint feature and the first domain name in the interception fingerprint database.

[0009] A second access request is obtained, a second domain name is extracted, and if the second domain name does not match the malicious domain name database, the fingerprint feature of the second client that initiated the second access request is extracted. The interception fingerprint database is queried using the second client fingerprint feature as the search keyword. If the query matches the first client fingerprint feature, the first domain name associated with the first client fingerprint feature is obtained; wherein the first client fingerprint feature and the second client fingerprint feature are used to characterize the access behavior initiated by the same client at different times.

[0010] Calculate the multidimensional feature similarity score between the second domain name and the first domain name. The similarity score is obtained by calculating the feature mapping of the domain name in three dimensions: literal, statistical, and structural.

[0011] When the similarity score exceeds a preset threshold, the second domain name is determined to be a malicious redirect domain name, and the second access request is blocked in real time.

[0012] Furthermore, the first client fingerprint characteristics and the second client fingerprint characteristics include: client IP, protocol type, TLS fingerprint, HTTP request header structure fingerprint, and interception time.

[0013] Furthermore, calculating the multidimensional feature similarity score between the second domain and the first domain includes: calculating the weighted edit distance similarity between the second domain and the first domain, where both the first domain and the first domain are divided into core identifier segments and auxiliary feature segments;

[0014] When calculating the edit distance between the second domain name and the first domain name, the character differences of the core identifier segment are assigned a first weight, and the character differences of the auxiliary feature segment are assigned a second weight, with the first weight being greater than the second weight.

[0015] Furthermore, calculating the multidimensional feature similarity score between the second domain and the first domain includes: calculating the character information entropy offset between the second domain and the first domain, as follows:

[0016] Calculate the Shannon entropy of the second domain name and the first domain name strings respectively;

[0017] Calculate the offset between the two entropy values, and use it as the information entropy offset.

[0018] If both are within the preset high-entropy range and the offset is less than the preset offset threshold, then they are determined to be domain names generated by the same source algorithm.

[0019] Furthermore, calculating the multidimensional feature similarity score between the second domain and the first domain includes: calculating the character pattern structure fingerprint similarity between the second domain and the first domain, as follows:

[0020] The characters of the second domain name and the first domain name are mapped to category codes according to their attributes, forming a topological structure sequence; the character attributes include at least letter, number, and symbol categories;

[0021] By comparing the topological sequence of two domain names, if they are completely identical, a first similarity score is assigned; if they are partially identical, a second similarity score is assigned according to the proportion of matching positions.

[0022] Furthermore, the multidimensional feature similarity score is calculated using the following formula:

[0023]

[0024] in, To weight the edit distance similarity, Information entropy offset, For structural fingerprint similarity, The preset weighting coefficients are 1.

[0025] Furthermore, the preset threshold is a dynamically adjusted threshold, which is negatively correlated with the attack frequency per unit time. The adjustment method is as follows:

[0026] Count the number of intercept logs that match the fingerprint features of the second client within a preset time window;

[0027] The preset threshold is dynamically lowered or raised based on the number of hits.

[0028] Furthermore, the method also includes:

[0029] If the second domain is determined to be a malicious redirect domain, the second domain and its corresponding multi-dimensional feature similarity score are pushed to the malicious domain analysis center to update the malicious domain database.

[0030] A real-time malicious domain name identification device based on fingerprint association includes:

[0031] The fingerprint database construction module is used to obtain the first access request, extract the first domain name, intercept the first access request if the first domain name matches the malicious domain name database, extract the fingerprint features of the first client that initiated the first access request, and store the association between the fingerprint features of the first client and the first domain name in the interception fingerprint database.

[0032] The fingerprint association matching module is used to obtain a second access request, extract a second domain name, and if the second domain name does not match the malicious domain name database, extract the fingerprint features of the second client that initiated the second access request, query the intercepted fingerprint database using the fingerprint features of the second client as search keywords, and if the query matches the fingerprint features of the first client, obtain the first domain name associated with the fingerprint features of the first client; wherein the fingerprint features of the first client and the fingerprint features of the second client are used to characterize the access behavior initiated by the same client at different times;

[0033] The multidimensional feature scoring module is used to calculate the multidimensional feature similarity score between the second domain name and the first domain name. The similarity score is calculated by the feature mapping of the domain name in three dimensions: literal, statistical and structural.

[0034] The identification and judgment module is used to determine that the second domain name is a malicious redirect domain name when the similarity score exceeds a preset threshold, and to intercept the second access request in real time.

[0035] An electronic device includes: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the programs, when executed by the processors, implement the fingerprint-based real-time identification method for malicious domain names as described above.

[0036] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the fingerprint-based real-time identification method for malicious domain names as described above.

[0037] A computer program product includes a computer program that, when executed by a processor, implements the fingerprint-based real-time malicious domain name identification method described above.

[0038] A real-time identification and interception system for malicious redirecting domains includes: a malicious analysis device, which is used to maintain and distribute a malicious domain database to the interception device, and is also used to receive malicious redirecting domains and associated interception log information reported by the interception device, add the malicious redirecting domains to the malicious domain database after completing the analysis and confirmation, and distribute the updated malicious domain database to all interception devices.

[0039] The interception device is used to perform the fingerprint-based real-time identification method for malicious domains as described above.

[0040] The present invention has the following beneficial effects:

[0041] (1) By storing the association between client fingerprint features and malicious domain names, an interception fingerprint database is established, enabling precise characterization of attackers. When the same attacker initiates subsequent requests with a new domain name, even if the new domain name has not yet been included in any malicious domain name database, the system can quickly identify that the request originates from the same source as historical malicious behavior by matching the second client fingerprint feature with the first client fingerprint feature in the interception fingerprint database, thereby completing real-time identification and interception of malicious redirection domain names within milliseconds. Compared with the traditional solution's delayed mode of waiting for malicious domain names to be reported, analyzed, added to the database, and issued before taking effect, this invention fundamentally eliminates the interception delay and achieves the protection capability of intercepting new domain names as soon as they appear.

[0042] (2) By calculating the multi-dimensional feature similarity score between the second domain and the first domain, a precise identification mechanism for domain evasion behavior was constructed. Attackers, in order to bypass blocking, typically only slightly modify the original malicious domain (such as changing the sequence number, randomizing the suffix, or replacing characters) to generate a redirect domain. These domains maintain a high degree of similarity to the original domain in terms of character structure, information entropy distribution, or pattern fingerprint. This invention, through comprehensive calculation of literal, statistical, and structural dimensions, can accurately capture this homology, and even if the domain has been slightly modified, it can still accurately determine that it is a malicious redirect domain. This mechanism significantly improves the identification accuracy against the massive number of evasion domains generated by automated tools, and significantly reduces false positives and false negatives.

[0043] (3) By employing dual verification through fingerprint matching and domain name similarity calculation, the technical threshold for attackers to bypass the system is enhanced. If an attacker attempts to circumvent the identification of this invention, they not only need to change the domain name but also alter the complete fingerprint characteristics of the client. Furthermore, they must ensure that the new domain name is not similar to the original domain name in terms of multi-dimensional features. This multi-dimensional and multi-level identification mechanism forces attackers to simultaneously change their attack toolchain and domain name generation strategy each time they attempt to escape, greatly increasing the cost of attacks and reversing the passive situation of "blocking once and bypassing once" in traditional defenses, thus forming a significant asymmetric advantage in offense and defense. Attached Figure Description

[0044] Figure 1 This is the overall architecture diagram of the fingerprint-based real-time malicious domain name identification system of the present invention;

[0045] Figure 2 This is a flowchart illustrating the real-time identification method for malicious domains based on fingerprint association according to the present invention. Detailed Implementation

[0046] The technical solution of the present invention will be further described below with reference to the accompanying drawings.

[0047] Existing technologies cannot utilize the behavioral characteristics of redirecting domains for real-time identification, resulting in significant shortcomings in intercepting malicious domain redirection escapes. This makes it difficult to meet the core requirements of real-time blocking and precise prevention and control in network security protection. This invention provides a real-time identification method for malicious domains based on fingerprint association. By mining the inherent behavioral association characteristics between malicious main domains and redirecting domains, a real-time identification scheme is constructed. It can complete accurate identification and interception before new redirecting domains are included in the malicious domain database, breaking through the lag limitation of existing schemes that rely on domains being added to the database.

[0048] Figure 1 This is a diagram illustrating the overall architecture of the fingerprint-based real-time malicious domain identification system of the present invention. It includes a malicious analysis device and an interception device, with a data interaction link established between the two. The proposed real-time identification method is applied to the interception device. The interception device receives a first access request, extracts a first domain name from it, and if the first domain name matches the malicious domain name database, it intercepts the first access request and extracts the fingerprint features of the first client initiating the request. The association between the first client fingerprint features and the first domain name is stored in the interception fingerprint database, providing basic data support for subsequent redirect domain name identification. A second access request is received, and a second domain name is extracted from it. If the second access request is initiated by the same client using a fine-tuned domain name, the second domain name will not match the malicious domain name database. In this case, the interception device extracts the fingerprint features of the second client initiating the second access request and queries the interception fingerprint database using the second client fingerprint features as search keywords. If the query matches the first client fingerprint features, the first domain name associated with those fingerprint features is obtained, and the multi-dimensional feature similarity score between the second domain name and the first domain name is calculated. If the score exceeds a preset threshold, the second domain name is determined to be a malicious redirect domain name and is intercepted in real time. The first and second client fingerprint features are used to characterize access behaviors initiated by the same client at different times. The identification and blocking based on dual verification of client fingerprint association and domain name similarity are both completed locally and in real time on the blocking device, without waiting for the domain name entry and distribution process from the malicious analysis device, thus achieving immediate blocking of redirected domain names. The malicious analysis device is used to maintain and distribute a malicious domain name database to the blocking devices, and also to receive malicious redirected domain names and related information reported by the blocking devices. After verifying and confirming the malicious redirected domain names, it adds them to the malicious domain name database and distributes the updated malicious domain name database to all blocking devices.

[0049] Reference Figure 2 A method for real-time identification of malicious domain names based on fingerprint association includes the following steps:

[0050] S1. Obtain the first access request, extract the first domain name, and if the first domain name matches the malicious domain name database, intercept the first access request, extract the fingerprint feature of the first client that initiated the first access request, and store the association between the first client fingerprint feature and the first domain name in the interception fingerprint database.

[0051] Specifically, the interception device parses the incoming raw network traffic, obtains the first access request, and extracts the first domain name from the first access request. If the first domain name matches a pre-stored malicious domain name database, an interception action is performed; otherwise, the first access request is allowed. In the case of matching and intercepting the first access request, the client's composite fingerprint feature information is extracted as the first client fingerprint feature, and the association between the first client fingerprint feature and the first domain name is stored in the interception fingerprint database. The client fingerprint feature includes the client IP, protocol type, TLS fingerprint, HTTP request header structure fingerprint, and interception time, which is encapsulated into a structured interception log and stored in a local high-speed cache database.

[0052] Client IP extraction involves obtaining the client's source IP address by parsing the IP header in the data packet.

[0053] Protocol type extraction: Obtain the protocol type from the transport layer protocol field of the data packet, such as TCP, UDP, HTTP, TLS (Transport Layer Security), etc. For encrypted traffic, extract the TLS version information through the TLS handshake phase.

[0054] TLS fingerprint extraction: Information such as supported encryption algorithms and extensions is extracted from the Client Hello request message in the TLS handshake to generate a TLS fingerprint (such as a JA3 fingerprint).

[0055] HTTP request header fingerprint extraction: Parse fields in the HTTP request header, such as User-Agent, Accept, Accept-Encoding, Connection, etc., and construct the HTTP request header structure fingerprint.

[0056] Interception time: Records the time when traffic was intercepted, such as by obtaining the system timestamp or packet header timestamp.

[0057] S2. Obtain the second access request, extract the second domain name, and if the second domain name does not match the malicious domain name database, extract the fingerprint feature of the second client that initiated the second access request, and query the intercepted fingerprint database using the fingerprint feature of the second client as the search keyword. If the query hits the fingerprint feature of the first client, obtain the first domain name associated with the fingerprint feature of the first client.

[0058] After the first access request is blocked, the client may automatically modify the requested domain name to continue accessing the site. At this point, the intercepting device obtains the second access request and extracts the second domain name. Since the second domain name does not match the malicious domain database, the intercepting device extracts the composite fingerprint feature of the client that initiated the second access request, i.e., the second client fingerprint feature. The fingerprint construction method is the same as in step S1 and can include IP, TLS fingerprints, and HTTP structure fingerprints. The interception log database is queried using this second fingerprint information as the search keyword. If no interception record corresponding to the fingerprint is found, the second access request is allowed. If the query finds a match, meaning it matches the first client fingerprint feature in the interception log database (interception fingerprint database), it indicates that the second access request is closely related to the first access request in terms of client address and protocol. The second access request is preliminarily determined to have the suspicion of origin escaping, and the process enters the deep risk assessment process.

[0059] This invention introduces TLS fingerprinting and HTTP request header structure fingerprinting, which not only verifies the domain name string itself, but also performs multi-dimensional verification based on underlying tool characteristics and traffic statistics, greatly improving the ability to identify machine-generated random domain names, and the false positive rate is significantly lower than that of simple string matching.

[0060] S3. Calculate the multidimensional feature similarity score between the second domain name and the first domain name. The similarity score is calculated by mapping the domain name's features in three dimensions: literal, statistical, and structural.

[0061] This invention constructs a deep risk discrimination model for redirected domains based on multi-dimensional feature mapping. By mapping the features of the new domain (second domain) and the historically blocked domain (first domain) across three dimensions—literal, statistical, and structural—a comprehensive association score S is calculated. The specific discrimination logic is as follows:

[0062] Position-weighted edit distance discrimination This step aims to identify escape attempts by fraud rings by tweaking non-core characters (such as serial numbers and random suffixes). The processing logic involves dividing the domain name string into a core identifier segment (the main body of the second-level domain) and an auxiliary feature segment (subdomains and suffixes). When calculating the edit distance between the two, a higher weighting coefficient is assigned to the differences in the core identifier segment, and a lower weighting coefficient is assigned to the differences in the auxiliary feature segment.

[0063] In this case, the weighted edit distance is calculated as follows:

[0064] First, divide the two domain names into a core identifier segment and a secondary feature segment. The core identifier segment is usually the main part of the domain name (such as example.com in www.example.com), while the secondary feature segment is the prefix or subdomain of the domain name (such as www in www.example.com).

[0065] Then, the edit distances for the core identifier segment and the auxiliary feature segment are calculated separately. For the core identifier segment, a conventional edit distance algorithm (such as Levenshtein distance) is used, and its differences are assigned higher weights; for the auxiliary feature segment, the same algorithm is used, but with lower weights. The final formula for calculating the weighted edit distance is as follows:

[0066]

[0067] in: It is the final weighted edit distance. It is the edit distance of the core identifier segment. Edit distance of auxiliary feature segments, and These are the weight coefficients of the core identifier segment and the auxiliary feature segment, respectively. ,and .

[0068] Character information entropy offset discrimination This step is used to identify family-type random domains generated by similar domain generation algorithms. The processing logic involves calculating the Shannon entropy of both the old and new domains to reflect the randomness of character distribution. The correlation is determined by comparing the entropy offsets (i.e., the absolute value of the difference) between the two. If both the old and new domains are in the high entropy range and the offset is extremely small, it can be determined that they belong to a series of domains generated by the same attack script.

[0069] The specific calculation method is as follows: calculate the Shannon entropy for the new domain name, denoted as H(new), calculate the Shannon entropy for the old domain name, denoted as H(old), and calculate the entropy difference between the new and old domain names:

[0070]

[0071] The score is calculated based on the entropy shift. The specific scores are as follows:

[0072]

[0073] This formula shows that when the entropy difference is small (i.e., the character distributions of the two are relatively similar), A higher score indicates a lower score; conversely, a lower score indicates a lower score.

[0074] Character pattern structure fingerprint similarity discrimination This step is used to identify domain names constructed using fixed visual templates or character substitutions. Processing logic: Each character in the domain name is mapped to a corresponding classification code according to its attribute (letter, number, special symbol), forming a topological structure sequence. Judgment logic: The topological sequences of the new and old domain names are compared; if the structural arrangement is completely identical, a high similarity score is assigned.

[0075] In this case, the method for calculating the similarity of character pattern structure fingerprints is as follows:

[0076] First, the characters in each domain name are categorized and assigned corresponding category codes based on their type (letters, numbers, special symbols, etc.). For example, L represents letters, S represents special symbols, N represents numbers, and P represents dots (i.e., separators in the domain name).

[0077] Then, a topology sequence is constructed: the entire domain name is mapped character-wise to a sequence of category codes. For example, the domain name "scam-01.com" will be mapped to "LLLLSNNPLLL".

[0078] Finally, similarity is calculated by comparing the topological sequence of the new and old domain names. Specifically, if the topological sequences of the new and old domain names are completely identical, the similarity is 1 (high similarity). If there are some differences in the topological sequences of the new and old domain names, the similarity decreases linearly according to the degree of difference, with the final score ranging from 0 to 1. Similarity score = number of matching characters / total number of characters. Where the number of matching characters is the number of characters in the same position of the classification code in the new and old domain names; the total number of characters is the total number of characters in the domain name.

[0079] A comprehensive weighted scoring mechanism is used. The scores from the three dimensions are weighted and summed to arrive at a comprehensive correlation score S.

[0080]

[0081] in , , The preset weights are 1, and their sum is 1. The preset weights are flexibly configured according to the interception tendency of the application scenario.

[0082] S4. When the similarity score exceeds a preset threshold, the second domain name is determined to be a malicious redirect domain name, and the second access request is blocked in real time.

[0083] The calculated comprehensive correlation score S is compared with the current dynamic blocking threshold T. If S>=T, it is determined to be a malicious redirect domain.

[0084] The interception threshold T can be dynamically adjusted based on the client's attack frequency. For example, the interception threshold T is inversely proportional to the attack frequency per unit time; that is, the higher the attack frequency, the lower the T value, and the stricter the interception threshold. If redirection is triggered continuously within a short period of time, the interception strategy automatically switches from lenient to strict.

[0085] Furthermore, after determining that a domain is a malicious redirect, the interception device immediately performs a blocking action and pushes the second domain as a derived malicious domain to the malicious analysis device in real time to assist in completing the domain's entry into the database, further improving the malicious domain database and forming a closed loop of "real-time interception - post-event database entry - long-term protection".

[0086] This invention breaks through the traditional, lagging model of "first analyzing and storing data in the database, then issuing interception orders." Through local fingerprint association and algorithmic deduction, it can identify and block redirected domains within milliseconds of their appearance, effectively addressing the domain escape problem. Furthermore, it can be smoothly upgraded based on the existing "analysis device + interception device" architecture without reconstructing core hardware. Only a composite feature log retention database (interception fingerprint database) and hierarchical risk judgment logic need to be added to the interception device. It possesses strong engineering practical value and is easy to promote and apply.

[0087] To verify the actual effectiveness of the method described in this invention in internet anti-fraud and malicious traffic interception scenarios, comparative tests were conducted. The test sample set included: 50,000 known malicious domain names, 10,000 newly generated escape redirect domain names using domain name generation algorithms, and "same fingerprint, multiple domain name" sequence attack traffic initiated by simulated fraud scripts.

[0088] The technical solution of the present invention is compared with the traditional interception solution, and the results are shown in Table 1 below.

[0089] Table 1. Interception performance of different schemes on the test sample set

[0090] Experimental indicators Traditional solution This invention Technology Improvement Evaluation Response latency for redirected domains 300s-1800s (limited by synchronous analysis) <20ms (local real-time simulation) Response speed greatly improved Success rate of automated tools to intercept Extremely low risk (simply change the domain name to escape). Significantly improve recognition probability Effectively suppress automated fraud scripts Attackers bypass cost Extremely low cost (only requires replacing the domain name with a cheaper one) The cost increases significantly (attackers need to change both the domain name generation strategy and client behavior characteristics simultaneously). The asymmetry between offense and defense has been significantly improved.

[0091] This invention also provides a real-time malicious domain name identification device based on fingerprint association, referring to... Figure 1 ,include:

[0092] The fingerprint database construction module is used to obtain the first access request, extract the first domain name, intercept the first access request if the first domain name matches the malicious domain name database, extract the fingerprint features of the first client that initiated the first access request, and store the association between the fingerprint features of the first client and the first domain name in the interception fingerprint database.

[0093] The fingerprint association matching module is used to obtain the second access request, extract the second domain name, and if the second domain name does not match the malicious domain name database, extract the fingerprint features of the second client that initiated the second access request, query the intercepted fingerprint database using the fingerprint features of the second client as search keywords, and if the query hits the fingerprint features of the first client, obtain the first domain name associated with the fingerprint features of the first client.

[0094] The multidimensional feature scoring module is used to calculate the multidimensional feature similarity score between the second domain name and the first domain name. The similarity score is calculated by the feature mapping of the domain name in three dimensions: literal, statistical and structural.

[0095] The identification and judgment module is used to determine that the second domain name is a malicious redirect domain name when the similarity score exceeds a preset threshold, and to intercept the second access request in real time.

[0096] It should be understood that the fingerprint-based real-time malicious domain name identification device in this embodiment can implement all the technical solutions in the above method embodiments. The functions of each of its functional modules can be specifically implemented according to the methods in the above method embodiments. The specific implementation process can be referred to the relevant descriptions in the above method embodiments, which will not be repeated here.

[0097] The present invention also provides an electronic device, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, wherein when the programs are executed by the processors, they implement the fingerprint-based real-time identification method for malicious domain names as described above.

[0098] The present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the fingerprint-based real-time identification method for malicious domain names as described above.

Claims

1. A method for real-time identification of malicious domain name based on fingerprint correlation, characterized in that, Includes the following steps: Obtain the first access request, extract the first domain name, and if the first domain name matches the malicious domain name database, intercept the first access request, extract the fingerprint feature of the first client that initiated the first access request, and store the association between the first client fingerprint feature and the first domain name in the interception fingerprint database. A second access request is obtained, a second domain name is extracted, and if the second domain name does not match the malicious domain name database, the fingerprint feature of the second client that initiated the second access request is extracted. The interception fingerprint database is queried using the second client fingerprint feature as the search keyword. If the query matches the first client fingerprint feature, the first domain name associated with the first client fingerprint feature is obtained; wherein the first client fingerprint feature and the second client fingerprint feature are used to characterize the access behavior initiated by the same client at different times. Calculate the multidimensional feature similarity score between the second domain name and the first domain name. The similarity score is obtained by calculating the feature mapping of the domain name in three dimensions: literal, statistical, and structural. When the similarity score exceeds a preset threshold, the second domain name is determined to be a malicious redirect domain name, and the second access request is blocked in real time.

2. The method of claim 1, wherein, The first and second client fingerprint features include: client IP, protocol type, TLS fingerprint, HTTP request header structure fingerprint, and interception time.

3. The method of claim 1, wherein, Calculating the multidimensional feature similarity score between the second domain and the first domain includes: calculating the weighted edit distance similarity between the second domain and the first domain, where both the first and second domains are divided into core identifier segments and auxiliary feature segments; When calculating the edit distance between the second domain name and the first domain name, a first weight is assigned to the character differences of the core identifier segment, and a second weight is assigned to the character differences of the auxiliary feature segment, wherein the first weight is greater than the second weight.

4. The method of claim 1, wherein, Calculating the multidimensional feature similarity score between the second domain and the first domain includes: calculating the character information entropy offset between the second domain and the first domain, as follows: Calculate the Shannon entropy of the second domain name and the first domain name strings respectively; Calculate the offset between the two entropy values, and use it as the information entropy offset. If both are within the preset high-entropy range and the offset is less than the preset offset threshold, then they are determined to be domain names generated by the same source algorithm.

5. The method of claim 1, wherein, Calculating the multidimensional feature similarity score between the second domain and the first domain includes: calculating the character pattern structure fingerprint similarity between the second domain and the first domain, as follows: The characters of the second domain name and the first domain name are mapped to category codes according to their attributes, forming a topological structure sequence; the character attributes include at least letter, number, and symbol categories; By comparing the topological sequence of two domain names, if they are completely identical, a first similarity score is assigned; if they are partially identical, a second similarity score is assigned according to the proportion of matching positions.

6. The method of claim 1, wherein, The multidimensional feature similarity score is calculated using the following formula: wherein, is a weighted edit distance similarity, is an information entropy offset, is a structural fingerprint similarity, are preset weight coefficients and the sum of which is 1.

7. The method of claim 1, wherein, The preset threshold is a dynamically adjusted threshold, which is negatively correlated with the attack frequency per unit time. The adjustment method is as follows: Count the number of intercept logs that match the fingerprint features of the second client within a preset time window; The preset threshold is dynamically lowered or raised based on the number of hits.

8. The method of claim 1, wherein, Also includes: If the second domain is determined to be a malicious redirect domain, the second domain and its corresponding multi-dimensional feature similarity score are pushed to the malicious domain analysis center to update the malicious domain database.

9. An electronic device, comprising: include: One or more processors; Memory; And one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, wherein when the programs are executed by the processors, they implement the fingerprint-based real-time identification method for malicious domain names as described in any one of claims 1-8.

10. A malicious domain name real-time identification system based on fingerprint association, characterized in that, include: The malicious analysis device is used to maintain and distribute the malicious domain name database to the interception device. It is also used to receive malicious redirect domain names and associated interception log information reported by the interception device. After completing the analysis and confirmation of the malicious redirect domain name, it adds it to the malicious domain name database and distributes the updated malicious domain name database to all interception devices. An interception device for performing the real-time identification method for malicious domain names based on fingerprint association as described in any one of claims 1-8.