Domain name detection method and device
A domain name detection and domain name technology, applied in the computer field, can solve data security threats, DN cannot be determined, etc.
Active Publication Date: 2020-09-11
亚信科技(成都)有限公司
5 Cites 0 Cited by
AI-Extracted Technical Summary
Problems solved by technology
As a result, after the DNS server receives the DN, it cannot determine whether the received DN is an illegal do...
Method used
It can be understood that the embodiment of the present invention adopts the above-mentioned Sa1-Sa2, the gray value of each pixel in the background of the image to be detected is set to 0, and only the gray value of the IDN to be detected is retain...
Abstract
The invention discloses a domain name detection method and device, relates to the technical field of computers, and is used for detecting whether an internationalized domain name (IDN) is valid or not. The method comprises the steps of obtaining a first feature sequence, wherein the first feature sequence is used for uniquely identifying a to-be-detected IDN; determining the similarity between thefirst feature sequence and each feature sequence in a stored feature sequence set, wherein each feature sequence in the feature sequence set is used for uniquely identifying a valid domain name; if the target similarity is greater than a first preset threshold, determining that the to-be-detected IDN is invalid, wherein the target similarity is the similarity with the highest numerical value in the determined similarities. The method in the embodiment of the invention is applied to detecting invalid domain names.
Application Domain
Transmission
Technology Topic
AlgorithmBioinformatics +4
Image
Examples
- Experimental program(1)
Example Embodiment
[0033] The technical solutions in the embodiments of the present invention will be described below in conjunction with the drawings in the embodiments of the present invention.
[0034] In the description of the present invention, unless otherwise specified, "/" means "or", for example, A/B can mean A or B. "And/or" in this article is only an association relationship that describes associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone These three situations. In addition, "at least one" means one or more, and "plurality" means two or more. The words "first" and "second" do not limit the quantity and order of execution, and the words "first" and "second" do not limit the difference.
[0035] In the following, the inventive concept of the present invention is introduced: At present, the operation and maintenance personnel set a blacklist and a whitelist in the DNS server or gateway; among them, the blacklist stores a list of illegal DNs that have been discovered, and the whitelist stores authenticated DNs. List of legal DNs. After the DNS or the gateway receives the DN corresponding to an IDN, it can judge whether the IDN corresponding to the DN is legal or illegal according to the storage list in the blacklist and the whitelist.
[0036] Based on the above technology, the present invention finds that after a hacker performs variant processing on an existing illegal IDN, a new illegal IDN will be generated. Correspondingly, the new illegal IDN also corresponds to a new DN. After the DNS server or gateway receives the new DN, since the DN corresponding to the new illegal IDN does not exist in the blacklist, the DNS server or gateway cannot determine whether the new IDN is an illegal IDN.
[0037] In view of the above technical problems, the present invention considers that when hackers perform IDN disguise, the core method is to make it difficult for users to distinguish the difference between an illegal IDN and a disguised legitimate domain name. After the DNS server receives an IDN, if it can use a method to uniquely identify the IDN and the legal domain name, and determine whether the new IDN is disguising a certain legal domain name based on the similarity between the unique identification results , That is, it can be determined whether the new IDN is an illegal DN.
[0038] Based on the above inventive concept, embodiments of the present invention provide a domain name detection method, which determines whether the IDN to be detected is an illegal IDN by calculating the similarity between the characteristic sequence of the IDN to be detected and the characteristic sequence of the legal domain name.
[0039] The domain name detection method provided by the embodiment of the present invention is applied to a domain name detection system. figure 2 Shows a schematic structural diagram of the domain name detection system. Such as figure 2 As shown, the domain name detection system 10 includes a domain name detection device 11 and a network device 12. The domain name detection apparatus 11 and the network device 12 may be connected in a wired manner or may be connected in a wireless manner, which is not limited in the embodiment of the present invention.
[0040] The domain name detection device 11 can be used to detect the legality of the acquired domain name, and can also be used to send DN, receive DN, and convert IDN and DN mutually (for example, as figure 1 As shown, the domain name 1 and domain name 2 are mutually converted), and the DN is resolved to generate an IP address.
[0041] The network device 12 may be a DNS server, or a firewall or gateway.
[0042] It should be noted that the domain name detection apparatus 11 and the network device 12 may be independent devices, or may be integrated into the same device, which is not specifically limited in the present invention.
[0043] When the domain name detection device 11 and the network device 12 are integrated in the same device, the communication mode between the domain name detection device 11 and the network device 12 is the communication between the internal modules of the device. In this case, the communication flow between the two is the same as the "communication flow between the domain name detection device 11 and the network device 12 when they are independent of each other."
[0044] In the following embodiments provided by the present invention, the present invention is described by taking the domain name detection device 11 and the network equipment 12 independently of each other as an example.
[0045] Combine the above figure 2 The shown domain name detection system 10 describes the domain name detection method provided by the embodiment of the present invention.
[0046] Such as image 3 As shown, the domain name detection method provided in this embodiment includes S201-S205:
[0047] S201. The domain name detection device 11 acquires a first characteristic sequence.
[0048] Among them, the first characteristic sequence is used to uniquely identify the IDN of the internationalized domain name to be detected.
[0049] Exemplarily, the first characteristic sequence may be a character string with a preset length.
[0050] S202. The domain name detection device 11 determines the similarity between the first feature sequence and each feature sequence in the stored feature sequence set.
[0051] Among them, each characteristic sequence in the characteristic sequence set is used to uniquely identify a legal domain name.
[0052] It should be noted that each feature sequence in the feature sequence set is generated based on a legal domain name.
[0053] S203. The domain name detection device 11 determines the target similarity from the determined similarity.
[0054] Among them, the target similarity is the similarity with the highest value among the determined similarities.
[0055] S204. The domain name detection device 11 determines whether the target similarity is greater than a first preset threshold.
[0056] It should be noted that the first preset threshold may be set in the domain name detection device 11 by the operation and maintenance personnel.
[0057] S205: If the target similarity is greater than the first preset threshold, the domain name detection device 11 determines that the IDN to be detected is illegal.
[0058] In one design, the domain name detection device 11 can update the DN corresponding to the IDN to be detected to the blacklist after determining that the IDN to be detected is illegal.
[0059] In the embodiment of the present invention, in order to obtain the first characteristic sequence, combine image 3 ,Such as Figure 4 As shown, S201 in the embodiment of the present invention may specifically include S2011-S2013:
[0060] S2011. The domain name detecting device 11 obtains the IDN to be detected.
[0061] As a possible implementation manner, the domain name detection apparatus 11 may receive the IDN to be detected sent by the network device 12.
[0062] As another possible implementation manner, the domain name detection device 11 may also convert the DN into an IDN to be detected after receiving the DN sent by the network device 12.
[0063] S2012. The domain name detection device 11 loads the IDN to be detected into a preset image according to a preset rule to generate the image to be detected.
[0064] The preset rules specifically include one or more of the character size, font type, character spacing, starting position, and character distribution mode of the domain name to be detected displayed in the preset image.
[0065] As a possible implementation, in order to reduce the amount of data processing of the subsequent image to be detected, the preset image may be a blank image.
[0066] It should be noted that the size of the preset image is larger than the size of the IDN to be detected after being displayed according to the preset rule. The generated image to be detected may specifically be a jpeg or png format type.
[0067] Exemplary, Figure 5 In (a), an optional image to be detected is given, and the image to be detected contains the IDN to be detected set according to the preset rules. The preset image can specifically include 64*64 pixels; the font type of the IDN to be detected is Arial font, the font size of the IDN to be detected is 10, and the character distribution of the IDN to be detected can be from left to right, top to bottom .
[0068] S2013. The domain name detection device 11 recognizes the image to be detected to determine the first feature sequence.
[0069] Among them, the first feature sequence is specifically used to uniquely identify the content displayed in the image to be detected by the IDN to be detected.
[0070] In the embodiment of the present invention, when the domain name detection device 11 receives the DN sent by the network device 12, in order to ensure the reliability of subsequent domain name detection, combined Figure 4 ,Such as Image 6 As shown, S2011 in the embodiment of the present invention may specifically include S20111-S20114:
[0071] S20111. The domain name detecting apparatus 11 receives the DN sent by the network device 12.
[0072] S20112. The domain name detecting device 11 judges whether the DN sent by the network device 12 can be converted into an IDN.
[0073] As a possible implementation manner, the domain name detection apparatus 11 analyzes the DN sent by the network device 12 to determine whether there is an IDN identification code in the DN.
[0074] The IDN identification code is automatically generated during the Punycode encoding process for the IDN before the network device 12 sends the DN to the domain name detection device 11 (Punycode encoding is used to convert IDN to DN); the IDN identification code is used for identification The DN corresponding to the IDN.
[0075] Exemplary, such as figure 1 As shown, the IDN identification code can be specifically figure 1 The xn in the domain name one shown.
[0076] In one case, if there is an IDN identification code in the DN, the domain name detection device 11 determines that the DN sent by the network device 12 can be converted into an IDN, and executes the following S20113.
[0077] In another case, if the IDN identification code does not exist in the DN, the domain name detection device 11 analyzes the DN to obtain an IP address.
[0078] It should be noted that the specific implementation of the domain name detection device 11 to resolve the DN to obtain the IP address can refer to the prior art, which will not be repeated here.
[0079] S20113: If it is determined that the DN sent by the network device 12 can be converted into an IDN, the domain name detection device 11 queries the blacklist and whitelist whether the DN sent by the network device 12 is included.
[0080] It should be noted that the blacklist includes a list of illegal domain names (including DNs corresponding to illegal IDNs and illegal non-IDN domain names); the whitelist includes a list of legal domain names (including DNs corresponding to legal IDNs and legal domain names). The blacklist and whitelist can be stored in the domain name detection device 11, or can be stored in a storage device that can communicate with the domain name detection device 11.
[0081] S20114. If it is determined that the blacklist and the blacklist do not include the DN sent by the network device 12, the domain name detection device 11 performs Punycode decoding on the received DN to obtain the IDN to be detected.
[0082] Among them, Punycode decoding is used to convert DN to IDN.
[0083] It should be noted that, for the implementation of Punycode decoding of the received DN by the domain name detection device 11, reference may be made to the prior art for details, and details are not described here.
[0084] It is understandable that the above S20111-S20114 provided by the embodiment of the present invention only proceed to the next detection action when it is determined that the received DN can be converted to an IDN, ensuring the validity of the subsequent domain name detection action; determining the blacklist and When the DN does not exist in the whitelist, it reflects that the DN corresponds to an IDN that has not been discovered before, ensuring the reliability of subsequent domain name detection actions.
[0085] In the embodiment of the present invention, in order to identify the image to be detected to determine the first feature sequence, the combination Figure 4 ,Such as Figure 7 As shown, S2013 in the embodiment of the present invention may specifically include S20131-S20132:
[0086] S20131. The domain name detection device 11 determines the low-frequency coefficient matrix of the image to be detected according to the image to be detected and the discrete cosine transform DCT algorithm.
[0087] Among them, each low-frequency coefficient in the low-frequency coefficient matrix is used to reflect the contour and grayscale distribution of the IDN to be detected in the image to be detected.
[0088] S20132. The domain name detecting device 11 uses a preset coding rule to code the low-frequency coefficient matrix of the image to be detected to generate a first feature sequence.
[0089] In a possible design, the first characteristic sequence provided by the embodiment of the present invention may be binary data.
[0090] The preset encoding rule may specifically include: if any low-frequency coefficient in the low-frequency coefficient matrix of the image to be detected is smaller than the second preset threshold, determining that the number corresponding to the above-mentioned arbitrary low-frequency coefficient in the first feature sequence is the first value If any one of the above-mentioned low-frequency coefficients is less than the second preset threshold, it is determined that the number corresponding to any one of the above-mentioned low-frequency coefficients in the first characteristic sequence is the first value; the binary data corresponding to the IDN to be detected and any low-frequency coefficient The corresponding number is the second value.
[0091] It should be noted that the second preset threshold may be an average value of the low frequency coefficients in the low frequency coefficient matrix of the image to be detected, and the second preset threshold may also be set in the domain name detection device 11 by the operation and maintenance personnel. The first value can be 1, and the first value can also be 0; when the first value is 1, the second value is 0; when the first value is 0, the first number is 1.
[0092] Exemplarily, when the number of elements included in the low-frequency coefficient matrix of the image to be detected is 15*15, the first feature sequence is specifically binary data including 225 characters.
[0093] In the embodiment of the present invention, in order to determine the low-frequency coefficient matrix of the image to be detected, combined Figure 7 ,Such as Figure 8 As shown, S20131 in the embodiment of the present invention may specifically include Sa-Sc:
[0094] Sa. The domain name detection device 11 obtains the grayscale matrix of the image to be detected.
[0095] Among them, the gray-scale matrix of the image to be detected is used to reflect the texture characteristics of the image to be detected.
[0096] As a possible implementation manner, the domain name detection device 11 obtains the gray value of each pixel in the image to be detected, and determines the gray matrix of the image to be detected according to the gray value of each pixel.
[0097] It should be noted that, for the specific implementation of the steps in this embodiment of the present invention, reference may be made to the prior art, and details are not described herein again.
[0098] Exemplarily, taking the foregoing preset image including 64*64 pixels as an example, the gray scale matrix obtained in this step is a 64*64 matrix.
[0099] Sb. The domain name detection device 11 determines the frequency coefficient matrix of the image to be detected according to the gray matrix of the image to be detected and the DCT algorithm.
[0100] As a possible implementation manner, the domain name detection device 11 inputs the gray scale matrix of the image to be detected into the DCT algorithm to generate the frequency coefficient matrix of the image to be detected.
[0101] It should be noted that the frequency coefficient matrix of the image to be detected includes the low-frequency coefficient matrix and the high-frequency coefficient matrix of the image to be detected; each element in the frequency coefficient matrix of the image to be detected is used to reflect the severity of changes in the grayscale of each pixel in the image to be detected .
[0102] Among them, each element in the low-frequency coefficient matrix of the image to be detected is the low-frequency coefficient of the image to be detected; each element in the high-frequency coefficient matrix of the image to be detected is the high-frequency coefficient of the image to be detected, and the high-frequency coefficient of the image to be detected is used for Reflect the detailed information of the image to be detected.
[0103] It should be noted that, for the specific implementation method of the DCT algorithm in this step in the embodiment of the present invention, reference may be made to the prior art, which will not be repeated here.
[0104] Sc. The domain name detection device 11 determines the low-frequency coefficient matrix of the image to be detected from the frequency coefficient matrix of the image to be detected.
[0105] As a possible implementation, in the frequency coefficient matrix of the image to be detected, the AC coefficients are removed from the upper left corner area to obtain the low frequency coefficient matrix of the image to be detected.
[0106] Among them, the upper left corner area of the frequency coefficient matrix of the image to be detected includes the low frequency coefficient matrix of the image to be detected. In the upper left corner area of the frequency coefficient matrix, the elements in the first row and the first column are the AC coefficients of the image to be detected; frequency coefficients The area at the lower right corner of the matrix contains the matrix of high-frequency coefficients of the image to be detected.
[0107] It should be noted that the AC coefficient of the image to be detected is used to reflect the content displayed on the edge of the image to be detected.
[0108] Exemplarily, when the pixels included in the preset image are 64*64, the frequency coefficient matrix of the image to be detected is as Picture 9 As shown, the number of elements of the frequency coefficient matrix obtained after DCT conversion of the preset image is also 64*64. Furthermore, the upper left corner area of the frequency coefficient matrix contains 16*16 elements, including the AC coefficients and low-frequency coefficients of the image to be detected. Among them, there are 31 AC coefficients, and the number of elements in the low-frequency coefficient matrix is 15*15.
[0109] In a possible design, in order to reduce the data processing work of the image to be tested, combined Figure 8 ,Such as Picture 10 As shown, Sa in the embodiment of the present invention may specifically include Sa1-Sa2:
[0110] Sa1, the domain name detection device 11 performs image binarization processing on the image to be detected to generate an intermediate image.
[0111] As a possible implementation manner, the domain name detection device 11 sets the gray values of all pixels in the image to be detected to 0 or 255, and displays the image to be detected as a black and white result.
[0112] It should be noted that in the intermediate image, the gray value of the pixels included in the IDN to be detected is 255, and the gray value of the pixels in the background part is 0.
[0113] Sa2, the domain name detection device 11 obtains the grayscale matrix of the intermediate image.
[0114] It should be noted that, for the specific implementation of this step, reference may be made to the above step Sa, which will not be repeated here.
[0115] It is understandable that the embodiment of the present invention adopts the above Sa1-Sa2, the gray value of each pixel in the background of the image to be detected is set to 0, only the gray value of the IDN to be detected is retained, that is, the background of the image to be detected is removed The background color in the middle can reduce the computational pressure of the domain name detection device and save hardware resources in the subsequent data processing work.
[0116] In one design, when the first feature sequence and each feature sequence in the feature sequence set are binary data, in order to determine whether the IDN to be detected is an illegal IDN, combined image 3 ,Such as Picture 11 As shown, S202 in the embodiment of the present invention specifically includes S2021:
[0117] S2021: The domain name detection device 11 calculates an edit distance between the first feature sequence and each feature sequence in the stored feature sequence set.
[0118] Wherein, each edit distance in the determined edit distance set is used to reflect the similarity between each feature sequence in the feature sequence set and the first feature sequence. The smaller the edit distance of the two feature sequences, the higher the similarity between the two feature sequences.
[0119] It should be noted that the generation manner of each feature sequence in the feature sequence set is the same as the generation manner of the first feature sequence, and each feature sequence in the feature sequence set has the same number of digits as the first feature sequence.
[0120] It should be noted that, in this step provided in the embodiment of the present invention, the specific method for calculating the edit distance between two binary data can refer to the prior art, which will not be repeated here.
[0121] In the case that the first feature sequence and each feature sequence in the feature sequence set are binary data, and the edit distance between the binary data reflects the similarity between the feature sequences, S203 in the embodiment of the present invention specifically includes S2031:
[0122] S2031. The domain name detection device 11 determines the edit distance with the smallest value from the calculated edit distance as the target edit distance.
[0123] In the case that the first feature sequence and each feature sequence in the feature sequence set are binary data, and the edit distance between the binary data reflects the similarity between the feature sequences, S204 in the embodiment of the present invention specifically includes S2041:
[0124] S2041. The domain name detecting device 11 determines whether the target edit distance is less than a third preset threshold.
[0125] It should be noted that the third preset threshold decreases as the first preset threshold increases, and the third preset threshold may also be set in the domain name detection device 11 by the operation and maintenance personnel.
[0126] In the case where the first feature sequence and each feature sequence in the feature sequence set are binary data, and the edit distance between the binary data reflects the similarity between the feature sequences, S205 provided in the embodiment of the present invention specifically includes S2051 :
[0127] S2051. If the target edit distance is less than or equal to the third preset threshold, the domain name detection device 11 determines that the IDN to be detected is illegal.
[0128] In the embodiment of the present invention, in order to determine the above-mentioned feature sequence set, combined image 3 ,Such as Picture 12 As shown, the domain name detection method provided in the embodiment of the present invention, before S202, specifically further includes S1-S3:
[0129] S1: The domain name detection device 11 obtains multiple legal domain names.
[0130] Exemplarily, each of the multiple legal domain names can be as figure 1 The domain name is shown in three.
[0131] In one design, the multiple legal domain names in this step may also include legal IDNs. For example, such as Figure 13 As shown in the legal domain name with serial number 3.
[0132] It is understandable that each of the multiple legal domain names provided by the embodiment of the present invention may be a legal IDN, which enables the domain name detection device 11 to determine whether the IDN to be detected is disguising other legal IDNs.
[0133] S2. The domain name detection device 11 respectively generates a characteristic sequence of each legal domain name among the plurality of legal domain names according to a plurality of legal domain names, preset rules and preset images.
[0134] Among them, the characteristic sequence of each legal domain name among the plurality of legal domain names is used to uniquely identify one legal domain name among the plurality of legal domain names.
[0135] It should be noted that the characteristic sequence of each legal domain name among the plurality of legal domain names is specifically used to uniquely identify the content displayed in the preset image according to the preset rule of one legal domain name among the plurality of legal domain names.
[0136] It should be noted that, for the specific implementation manner of this step in the embodiment of the present invention, refer to the foregoing S2011-S2013, which will not be repeated here.
[0137] In one design, in order to ensure the accuracy of domain name detection device 11 detecting domain names, such as Figure 5 As shown in (b), the preset rules, preset images, DCT algorithm, and preset encoding rules used by S2 in the embodiment of the present invention when generating the characteristic sequence of each legal domain name among multiple legal domain names are the same as S2011 -The preset rules, preset images, DCT algorithm and preset encoding rules in S2013 are the same.
[0138] S3. The domain name detection device 11 stores a plurality of legal domain names and a characteristic sequence of each legal domain name among the plurality of legal domain names to generate a characteristic sequence set.
[0139] It is understandable that the above S1-S3 provided in the embodiment of the present invention can generate a feature sequence set, which provides a data basis for determining the similarity in S202. At the same time, since the generation method and conditions of each feature sequence in the feature sequence set are the same as the first feature sequence, the accuracy of the domain name detection device 11 can be guaranteed.
[0140] Exemplarily, the feature sequence set can be as Figure 13 As shown; where the feature sequence set includes multiple legal domain names, and the feature sequence of each legal domain name in the multiple legal domain names.
[0141] In one design, considering that the feature sequences contained in the feature sequence set are of massive levels, and when hackers are disguising domain names, in order to make it difficult for users to distinguish illegal domain names, they will set the string length of illegal domain names to The string length of legal domain names is the same. Therefore, in order to be able to reduce the working pressure of the domain name detection device 11 in determining the similarity and save the calculation time, such as Figure 13 As shown, the feature sequence set in the embodiment of the present invention also includes the length of the character string of each legal domain name among the multiple legal domain names. Combine image 3 ,Such as Figure 14 As shown, S202 in the embodiment of the present invention specifically further includes S2022-S2024:
[0142] S2022. The domain name detecting device 11 obtains the length of the character string of the IDN to be detected.
[0143] S2023. The domain name detection device 11 determines the first target feature sequence in the feature sequence set according to the length of the character string of the IDN to be detected.
[0144] The length of the character string of the legal domain name corresponding to the first target characteristic sequence is the same as the length of the character string of the IDN to be detected.
[0145] S2024. The domain name detection device 11 calculates the similarity between the first feature sequence and the determined first target feature sequence.
[0146] It should be noted that the calculation of the similarity between the feature sequences in this step may refer to the above S2021 for details, which will not be repeated here.
[0147] It is understandable that the embodiment of the present invention adopts the above S2022-S2024, which can determine the first target feature sequence from the feature sequence set, and filter the legal domain names with the same length as the IDN string to be detected from the massive feature sequence, which can reduce The calculation pressure of the domain name detection device 11 saves calculation time.
[0148] In another design, considering that the feature sequences contained in the feature sequence set are of massive levels, and when hackers are disguising domain names, in order to make it difficult for users to distinguish illegal domain names, they will set the language of the illegal domain names to the legal ones. The language family contains a language family with higher character similarity. Therefore, in order to be able to reduce the working pressure of the domain name detection device 11 in determining the similarity and save the calculation time, such as Figure 13 As shown, the feature sequence set of the embodiment of the present invention also includes the language identifier of each legal domain name among the multiple legal domain names; combined Picture 12 ,Such as Figure 15 As shown, S202 in the embodiment of the present invention specifically further includes S2025-S2028:
[0149] S2025. The domain name detecting device 11 obtains the language family identifier of the IDN to be detected.
[0150] S2026. The domain name detecting device 11 determines the target language identifier according to the language identifier of the IDN to be detected.
[0151] Wherein, the similarity between the characters corresponding to the target language family and the characters contained in the language family of the IDN to be detected is greater than the fourth preset threshold.
[0152] As a possible implementation manner, the domain name detection device 11 can query the target language family identifier from the language family correspondence list stored in advance; the language family correspondence list contains the similarity between each language family.
[0153] It should be noted that the fourth preset threshold may be specifically set in the domain name detection device 11 by the operation and maintenance personnel.
[0154] S2027. The domain name detection device 11 determines the second target feature sequence in the feature sequence set according to the target language identifier.
[0155] Wherein, each second target characteristic sequence in the second target characteristic sequence set corresponds to the target language identifier.
[0156] S2028. The domain name detection device 11 calculates the similarity between the first feature sequence and the determined second target feature sequence.
[0157] It should be noted that the calculation of the similarity between the feature sequences in this step may refer to the above S2021 for details, which will not be repeated here.
[0158] It is understandable that the embodiment of the present invention adopts the above S2025-S2028, which can determine the second target feature sequence from the feature sequence set, and screen the legal domain names that the IDN to be detected can disguise from the massive feature sequence, which can reduce the number of domain name detection devices. 11 calculation pressure, saving calculation time.
[0159] In another design, in order to reduce the working pressure of the domain name detection device 11 when determining the similarity and save the calculation time, S202 provided in the embodiment of the present invention can also be specifically based on the determined first target feature sequence and the determined The second target feature sequence determines the third target feature sequence in the feature sequence set; and calculates the similarity between the first feature sequence and the determined third target feature sequence; the third target feature sequence is the determined first The intersection of the target feature sequence and the determined second target feature.
[0160] The domain name detection method provided by the embodiment of the present invention is applied to detect illegal domain names. This method determines whether the IDN to be detected is disguising a legal domain name by calculating the similarity between the first characteristic sequence of the IDN to be detected and the characteristic sequence of the legal domain name. Since the first feature sequence is used to uniquely identify the IDN to be detected, and the feature sequence in the feature sequence set is used to uniquely identify the legal domain name, the similarity between the first feature sequence and each feature sequence in the feature sequence set can reflect the to-be-detected The degree of similarity between IDN and multiple legal domain names. Furthermore, if the similarity with the highest value among the determined similarities exceeds the first preset threshold, it can be determined that the IDN to be detected is disguising a legal domain name. Finally, using the above-mentioned technical means, illegal IDNs can be accurately detected.
[0161] The foregoing mainly introduces the solutions provided by the embodiments of the present invention from the perspective of methods. In order to realize the above-mentioned functions, it includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that, in combination with the units and algorithm steps of the examples described in the embodiments disclosed herein, the embodiments of the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
[0162] The embodiment of the present invention may divide the domain name detection apparatus 11 into functional modules according to the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. Optionally, the division of modules in the embodiment of the present invention is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
[0163] Figure 16 This is a schematic structural diagram of a domain name detection device provided by an embodiment of the present invention. Such as Figure 16 As shown, the domain name detection device 11 is used to detect the legitimacy of the acquired IDN, for example, to perform image 3 The domain name detection method shown. The domain name detection device 11 includes an acquisition unit 111, a determination unit 112, and a judgment unit 113.
[0164] The obtaining unit 111 is configured to obtain the first characteristic sequence to be obtained. Among them, the first characteristic sequence is used to uniquely identify the IDN of the internationalized domain name to be detected. For example, combine image 3 , The acquiring unit 111 may be used to execute S201.
[0165] The determining unit 112 is configured to determine the similarity between the first feature sequence and each feature sequence in the stored feature sequence set according to the first feature sequence acquired by the obtaining unit 111; wherein, each feature in the feature sequence set The sequence is used to uniquely identify a legal domain name. For example, combine image 3 , The determining unit 112 may be used to execute S202.
[0166] The determining unit 112 is further configured to determine the target similarity from the determined similarities; wherein the target similarity is the similarity with the highest value among the determined similarities. For example, combine image 3 , The determining unit 112 may be used to execute S203.
[0167] The determining unit 113 is configured to determine whether the target similarity determined by the determining unit 112 is greater than a first preset threshold. For example, combine image 3 , The judging unit 113 can be used to execute S204.
[0168] The determining unit 112 is further configured to determine that the IDN to be detected is illegal if the target similarity is greater than the first preset threshold. For example, combine image 3 , The determining unit 112 may be used to execute S205.
[0169] Optional, such as Figure 17 As shown, the obtaining unit 111 provided in the embodiment of the present invention specifically includes an obtaining subunit 1111, a generating subunit 1112, and a determining subunit 1113.
[0170] The obtaining subunit 1111 is used to obtain the IDN to be detected. For example, combine Figure 4 , The obtaining subunit 1111 may be used to execute S2011.
[0171] The generating subunit 1112 is configured to load the IDN to be detected obtained by the obtaining subunit 1111 into a preset image according to a preset rule to generate the image to be detected. For example, combine Figure 4 , The generating subunit 1112 may be used to execute S2012.
[0172] The determining subunit 1113 is configured to identify the image to be detected generated by the generating subunit 1112 to determine the first feature sequence. For example, combine Figure 4 , The determining subunit 1113 can be used to execute S2013.
[0173] Optional, such as Figure 17 As shown, the determining subunit 1113 provided by the embodiment of the present invention is specifically configured to determine the low-frequency coefficient matrix of the image to be detected according to the image to be detected and the discrete cosine transform DCT algorithm; wherein, each low-frequency coefficient in the low-frequency coefficient matrix of the image to be detected Used to reflect the contour and grayscale distribution of the IDN to be detected in the image to be detected. For example, combine Figure 7 , The determining subunit 1113 may be used to execute S20131.
[0174] The determining subunit 1113 is specifically further configured to use a preset encoding rule to encode the low-frequency coefficient matrix of the image to be detected to generate the first feature sequence. For example, combine Figure 7 , The determining subunit 1113 may be used to execute S20132.
[0175] Optional, such as Figure 17 As shown, the domain name detection apparatus 11 provided in the embodiment of the present invention further includes a generating unit 114.
[0176] The obtaining unit 111 is also used to obtain multiple legal domain names. For example, combine Picture 12 , The acquiring unit 111 may be used to execute S1.
[0177] The generating unit 114 is configured to generate the characteristic sequence of each legal domain name among the multiple legal domain names according to the multiple legal domain names, preset rules, and preset images obtained by the obtaining unit 111; wherein, each legal domain name among the multiple legal domain names The characteristic sequence of is used to uniquely identify a legal domain name among multiple legal domain names. For example, combine Picture 12 , The generating unit 114 can be used to execute S2.
[0178] The generating unit 114 is further configured to store the multiple legal domain names and the feature sequence of each legal domain name in the multiple legal domain names after generating the feature sequence of each legal domain name in the multiple legal domain names to generate a feature sequence set. For example, combine Picture 12 , The generating unit 114 can be used to execute S3.
[0179] Optionally, the feature sequence set provided in the embodiment of the present invention further includes the length of the character string of each legal domain name among the multiple legal domain names; for example, Figure 17 As shown, the determining unit 112 of the embodiment of the present invention is specifically also used to obtain the string length of the IDN to be detected. For example, combine Figure 14 , The determining unit 112 may be used to execute S2022.
[0180] The determining unit 112 is specifically further configured to, after obtaining the string length of the IDN to be detected, determine the first target feature sequence of the feature sequence set according to the length of the string of the IDN to be detected; wherein the first target feature sequence corresponds to a valid The length of the character string of the domain name is the same as that of the IDN to be detected. For example, combine Figure 14 , The determining unit 112 may be used to execute S2023.
[0181] The determining unit 112 is specifically further configured to calculate the similarity between the first feature sequence and the determined first target feature sequence. For example, combine Figure 14 , The determining unit 112 may be used to execute S2024.
[0182] Optionally, the feature sequence set provided in the embodiment of the present invention further includes the language identifier of each legal domain name among the multiple legal domain names. Such as Figure 17 As shown, the determining unit 112 in the embodiment of the present invention is specifically also used to obtain the language family identifier of the IDN to be detected. For example, combine Figure 15 , The determining unit 112 may be used to execute S2025.
[0183] The determining unit 112 is specifically further configured to, after obtaining the language family identification of the IDN to be detected, determine the target language family identification according to the language family identification of the IDN to be detected; wherein the characters corresponding to the target language family identification are similar to the characters contained in the language family of the IDN to be detected The degree is greater than the fourth preset threshold. For example, combine Figure 15 , The determining unit 112 may be used to execute S2026.
[0184] The determining unit 112 is specifically further configured to determine the second target feature sequence in the feature sequence set according to the target language identity after the target language identity is determined; wherein, each second target feature sequence in the second target feature sequence set and the target language family Logo correspondence. For example, combine Figure 15 , The determining unit 112 may be used to execute S2027.
[0185] The determining unit 112 is specifically further configured to calculate the similarity between each second target feature sequence in the second target feature sequence set and the first feature sequence. For example, combine Figure 15 , The determining unit 112 may be used to execute S2028.
[0186] Optional, such as Figure 17 As shown, the acquiring unit 111 provided in the embodiment of the present invention is specifically also configured to receive the DN sent by the network device 12. For example, combine Image 6 , The obtaining unit 111 may be used to execute S20111.
[0187] The acquiring unit 111 is specifically further configured to determine whether the DN sent by the network device 12 can be converted into an IDN. For example, combine Image 6 , The obtaining unit 111 may be used to execute S20112.
[0188] The acquiring unit 111 is specifically further configured to, if it is determined that the DN sent by the network device 12 can be converted into an IDN, query whether the DN sent by the network device 12 is included in the blacklist and the whitelist. For example, combine Image 6 , The obtaining unit 111 may be used to execute S20113.
[0189] The acquiring unit 111 specifically uses if it is determined that the blacklist and the DN sent by the network device 12 are not included in the blacklist, Punycode decoding the received DN to obtain the IDN to be detected. For example, combine Image 6 , The obtaining unit 111 may be used to execute S20114.
[0190] Optional, such as Figure 17 As shown, the obtaining unit 111 provided in the embodiment of the present invention is specifically further configured to determine the low-frequency coefficient matrix of the image to be detected according to the image to be detected and the discrete cosine transform DCT algorithm after obtaining the image to be detected. For example, combine Figure 7 , The obtaining unit 111 may be used to execute S20131.
[0191] The acquiring unit 111 is specifically further configured to encode the low-frequency coefficient matrix of the image to be detected by using a preset encoding rule after determining the low-frequency coefficient matrix of the image to be detected to generate the first feature sequence. For example, combine Figure 7 , The obtaining unit 111 may be used to execute S20132.
[0192] Optional, such as Figure 17 As shown, the embodiment of the present invention provides an obtaining unit 111, which is specifically also used to obtain a grayscale matrix of an image to be detected. For example, combine Figure 8 , The acquiring unit 111 may be used to execute Sa.
[0193] The obtaining unit 111 is specifically further configured to determine the frequency coefficient matrix of the image to be detected according to the gray matrix of the image to be detected and the DCT algorithm after obtaining the gray matrix of the image to be detected. For example, combine Figure 8 , The acquiring unit 111 may be used to execute Sb.
[0194] The acquiring unit 111 is specifically further configured to determine the low-frequency coefficient matrix of the image to be detected from the frequency coefficient matrix of the image to be detected after determining the frequency coefficient matrix of the image to be detected. For example, combine Figure 8 , The acquiring unit 111 can be used to execute Sc.
[0195] Optional, such as Figure 17 As shown, the embodiment of the present invention provides the acquisition unit 111, which is specifically used to perform image binarization processing on the image to be detected to generate an intermediate image. For example, combine Picture 10 , The obtaining unit 111 may be used to execute Sa1.
[0196] The obtaining unit 111 is specifically also used to obtain the grayscale matrix of the intermediate image. For example, combine Picture 10 , The acquiring unit 111 may be used to execute Sa2.
[0197] Optional, such as Figure 16 As shown, the determining unit 112 provided in the embodiment of the present invention is specifically used to calculate the edit distance between the first feature sequence and each feature sequence in the stored feature sequence set after acquiring the first feature sequence of the IDN to be detected . For example, combine Picture 11 , The acquiring unit 111 may be used to execute S2021.
[0198] The determining unit 112 is specifically further configured to determine the edit distance with the smallest value from the calculated edit distance as the target edit distance. For example, combine Picture 11 , The determining unit 112 may be used to execute S2031.
[0199] The determining unit 113 is specifically configured to determine whether the target edit distance is less than the third preset threshold. For example, combine Picture 11 , The judging unit 113 can be used to execute S2041.
[0200] The determining unit 112 is specifically further configured to determine that the IDN to be detected is illegal if the target editing distance is less than or equal to the third preset threshold. For example, combine Picture 11 , The determining unit 112 may be used to execute S2051.
[0201] In the case of implementing the functions of the above-mentioned integrated modules in the form of hardware, the embodiment of the present invention provides another possible schematic structural diagram of the domain name detection apparatus involved in the above-mentioned embodiment. Such as Figure 18 As shown, a domain name detection device 30 is used to detect an illegal IDN, for example, to perform image 3 The domain name detection method shown. The domain name detection device 30 includes a processor 301, a memory 302, a communication interface 303, and a bus 304. The processor 301, the memory 302, and the communication interface 303 may be connected through a bus 304.
[0202] The processor 301 is the control center of the communication device, and may be a processor or a collective name for multiple processing elements. For example, the processor 301 may be a general-purpose central processing unit (central processing unit, CPU), or other general-purpose processors. Among them, the general-purpose processor may be a microprocessor or any conventional processor.
[0203] As an embodiment, the processor 301 may include one or more CPUs, for example Figure 18 CPU 0 and CPU 1 shown in.
[0204] The memory 302 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions The dynamic storage device can also be an electrically erasable programmable read-only memory (EEPROM), a disk storage medium or other magnetic storage device, or it can be used to carry or store instructions or data structures The desired program code and any other medium that can be accessed by the computer, but not limited to this.
[0205] As a possible implementation manner, the memory 302 may exist independently of the processor 301, and the memory 302 may be connected to the processor 301 through a bus 304 for storing instructions or program codes. When the processor 301 calls and executes the instructions or program codes stored in the memory 302, it can implement the domain name detection method provided in the embodiment of the present invention.
[0206] In another possible implementation manner, the memory 302 may also be integrated with the processor 301.
[0207] The communication interface 303 is used to connect with other devices through a communication network. The communication network may be Ethernet, wireless access network, wireless local area networks (WLAN), etc. The communication interface 303 may include a receiving unit for receiving data, and a sending unit for sending data.
[0208] The bus 304 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, Figure 18 It is represented by a thick line only, but it does not mean that there is only one bus or one type of bus.
[0209] It should be pointed out that Figure 18 The illustrated structure does not constitute a limitation on the domain name detection device 30. except Figure 18 In addition to the components shown, the domain name detection device 30 may include more or less components than shown, or some components may be combined, or different component arrangements.
[0210] As an example, combine Figure 16 , The functions implemented by the acquisition unit 111, the determination unit 112, and the judgment unit 113 in the domain name detection device are the same as Figure 18 The functions of the processor 301 are the same.
[0211] Figure 19 It shows another hardware structure of the domain name detection device in the embodiment of the present invention. Such as Figure 19 As shown, the domain name detection device 40 may include a processor 401 and a communication interface 402. The processor 401 is coupled with the communication interface 402.
[0212] For the function of the processor 401, reference may be made to the description of the processor 301 above. In addition, the processor 401 also has a storage function, and you can refer to the function of the memory 302 described above.
[0213] The communication interface 402 is used to provide data for the processor 401. The communication interface 402 may be an internal interface of the communication device or an external interface of the communication device (equivalent to the communication interface 303).
[0214] It should be pointed out that Figure 18 (or Figure 19 The structure shown in) does not constitute a limitation on the communication device, except Figure 18 (or Figure 19 In addition to the components shown in ), the domain name detection device 11 may include more or fewer components than shown, or a combination of certain components, or a different component arrangement.
[0215] Through the description of the foregoing implementation manners, those skilled in the art can clearly understand that, for convenience and concise description, only the division of the above-mentioned functional units is used for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional units as required, that is, the internal structure of the device is divided into different functional units to complete all or part of the functions described above. For the specific working process of the above-described system, device and unit, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.
[0216] The embodiment of the present invention also provides a computer-readable storage medium, and the computer-readable storage medium stores instructions. When the computer executes the instruction, the computer executes each step in the method flow shown in the above method embodiment.
[0217] The embodiment of the present invention provides a computer program product containing instructions, which when the instructions run on a computer, cause the computer to execute the domain name detection method in the foregoing method embodiments.
[0218] The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks. Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), registers, hard disk, optical fiber, portable compact Compact Disc Read-Only Memory (CD-ROM), optical storage device, magnetic storage device, or any other form of computer-readable storage medium with appropriate combinations of the above, or values in the field. An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and can write information to the storage medium. Of course, the storage medium may also be an integral part of the processor. The processor and the storage medium may be located in an Application Specific Integrated Circuit (ASIC). In the embodiment of the present invention, the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
[0219] Since the domain name detection device, computer readable storage medium, and computer program product in the embodiment of the present invention can be applied to the above method, the technical effect that can be obtained can also refer to the above method embodiment, and the embodiment of the present invention is here. No longer.
[0220] The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. Any changes or substitutions within the technical scope disclosed by the present invention should be covered by the protection scope of the present invention. .
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.