Chinese domain name similar measurement method based on J-W distance
A Chinese domain name and similarity measurement technology, applied in the field of network security, can solve the problems of poor efficiency, increase the accuracy and timeliness of Chinese domain name similarity measurement, and insufficient accuracy, and achieve a solution to the problems of insufficient accuracy and increased accuracy and timeliness. Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0039] Embodiment 1: as figure 1 As shown, a Chinese domain name similarity measurement method based on J-W distance, the specific steps are:
[0040] Step1: Obtain the domain name X to be detected and the target domain name Y;
[0041] Step2: The domain name X to be detected and the target domain name Y are separated by a period "." or a period ".", ignoring the network name and domain name suffix, retaining the domain name body, and generating the Chinese character set x of the domain name body:{x 1 ,x 2 …x p} and y: {y 1 ,y 2 ...y q};
[0042] Step3: According to the stroke sequence table of Unicode Chinese characters, traverse the Chinese character set x of the domain name body obtained in Step2:{x 1 ,x 2 …x p} and y: {y 1 ,y 2 ...y q}, for each Chinese character x according to the set character order i , i∈[1,p] or y i ,i∈[1,q] finds the stroke order of the corresponding Chinese characters, converts according to the corresponding encoding rules, and generate...
Embodiment 2
[0068] Embodiment 2: On the basis of Embodiment 1, the calculation of the number m of matching characters and the transposition n of matching characters will be further described. Assuming that the domain names of the domain name X to be detected and the target domain name Y are "cure disease" and "cure disease" respectively, look up the corresponding Chinese character codes through the Unicode Chinese character stroke order table, "cure" is "44154251", and "disease" is "4134112534" , "铁" is "4154251", the generated encoded string str x 、str y "441542514134112534" and "41542514134112534" respectively.
[0069] Calculate the matching window value MW:
[0070]
[0071] Combine detection matrix I(X,Y) 18×17 , calculate the number of matching characters m and the number of matching characters transposition n:
[0072]
[0073]
[0074] As shown in the above table (matrix): " / " in the table (matrix) means that the value MW exceeds the matching window, regardless of whe...
Embodiment 3
[0076] Embodiment 3: On the basis of Embodiment 1, the practical application of the present invention is further described. Assuming that the domain name X to be detected and the target domain name Y are "Today's Technology. China" and "Lingri Technology. China", after initialization, the main domain names are "Today's Technology" and "Lingri Technology". Find the corresponding Chinese character encoding, the encoded string str generated according to the rules x 、str y "344525113123444121211254" and "3445425113123444121211254" respectively.
[0077] Calculate the matching window value MW:
[0078]
[0079] Combine detection matrix I(X,Y) 24×25 , the calculated number of matching characters is m=24, and the number of matching characters is transposition n=8.
[0080] Computes the encoded string str x 、str y Jaro Distance:
[0081]
[0082] longest common substring str xy the length of len xy =20, further calculate the encoded string str x 、str y The Jaro-Winkle...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


