Unlock instant, AI-driven research and patent intelligence for your innovation.

Chinese domain name similar measurement method based on J-W distance

A Chinese domain name and similarity measurement technology, applied in the field of network security, can solve the problems of poor efficiency, increase the accuracy and timeliness of Chinese domain name similarity measurement, and insufficient accuracy, and achieve a solution to the problems of insufficient accuracy and increased accuracy and timeliness. Effect

Active Publication Date: 2018-01-19
KUNMING UNIV OF SCI & TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Compared with the current Chinese domain name similarity measurement method, the present invention mainly solves the problems of insufficient accuracy and poor efficiency in the prior art, and devotes itself to increasing the accuracy and timeliness of the current Chinese domain name similarity measurement.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese domain name similar measurement method based on J-W distance
  • Chinese domain name similar measurement method based on J-W distance
  • Chinese domain name similar measurement method based on J-W distance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] Embodiment 1: as figure 1 As shown, a Chinese domain name similarity measurement method based on J-W distance, the specific steps are:

[0040] Step1: Obtain the domain name X to be detected and the target domain name Y;

[0041] Step2: The domain name X to be detected and the target domain name Y are separated by a period "." or a period ".", ignoring the network name and domain name suffix, retaining the domain name body, and generating the Chinese character set x of the domain name body:{x 1 ,x 2 …x p} and y: {y 1 ,y 2 ...y q};

[0042] Step3: According to the stroke sequence table of Unicode Chinese characters, traverse the Chinese character set x of the domain name body obtained in Step2:{x 1 ,x 2 …x p} and y: {y 1 ,y 2 ...y q}, for each Chinese character x according to the set character order i , i∈[1,p] or y i ,i∈[1,q] finds the stroke order of the corresponding Chinese characters, converts according to the corresponding encoding rules, and generate...

Embodiment 2

[0068] Embodiment 2: On the basis of Embodiment 1, the calculation of the number m of matching characters and the transposition n of matching characters will be further described. Assuming that the domain names of the domain name X to be detected and the target domain name Y are "cure disease" and "cure disease" respectively, look up the corresponding Chinese character codes through the Unicode Chinese character stroke order table, "cure" is "44154251", and "disease" is "4134112534" , "铁" is "4154251", the generated encoded string str x 、str y "441542514134112534" and "41542514134112534" respectively.

[0069] Calculate the matching window value MW:

[0070]

[0071] Combine detection matrix I(X,Y) 18×17 , calculate the number of matching characters m and the number of matching characters transposition n:

[0072]

[0073]

[0074] As shown in the above table (matrix): " / " in the table (matrix) means that the value MW exceeds the matching window, regardless of whe...

Embodiment 3

[0076] Embodiment 3: On the basis of Embodiment 1, the practical application of the present invention is further described. Assuming that the domain name X to be detected and the target domain name Y are "Today's Technology. China" and "Lingri Technology. China", after initialization, the main domain names are "Today's Technology" and "Lingri Technology". Find the corresponding Chinese character encoding, the encoded string str generated according to the rules x 、str y "344525113123444121211254" and "3445425113123444121211254" respectively.

[0077] Calculate the matching window value MW:

[0078]

[0079] Combine detection matrix I(X,Y) 24×25 , the calculated number of matching characters is m=24, and the number of matching characters is transposition n=8.

[0080] Computes the encoded string str x 、str y Jaro Distance:

[0081]

[0082] longest common substring str xy the length of len xy =20, further calculate the encoded string str x 、str y The Jaro-Winkle...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese domain name similar measurement method based on J-W distance, and belongs to the technical field of network safety. After a Chinese character is coded through a Unicode Chinese character stroke sequence table, the Chinese character is mapped into one series of numeric type character string, and meanwhile, a Jaro-Winler Distance algorithm of a machine learning field is innovatively imported and is combined with a longest common substring so as to carry out similar measurement on the Chinese domain name. The method comprises the following steps that: firstly obtaining a domain name to be detected and a target domain name, and initializing the domain name to be detected and the target domain name to generate a domain name body; then, according to the UnicodeChinese character stroke sequence table, carrying out coding processing on the domain name body, generating the numeric type character string, and taking the numeric type character string as the input of the Jaro-Winler Distance algorithm to generate a detection matrix; and then, combining with the longest common substring of the numeric type character string, and calculating the similarity of the numeric type character string according to a relevant rule, wherein the similarity of the numeric type character string can effectively represent the similarity between Chinese characters.

Description

technical field [0001] The invention relates to a method for measuring the similarity of Chinese domain names based on J-W distance, and belongs to the technical field of network security. Background technique [0002] With the development and popularization of the Internet, Chinese domain names have gradually become an important part of international domain names. At the same time, domain name phishing attacks against Chinese domain names are increasing day by day, and the forms of domain name phishing are becoming more and more complex. Due to the existence of many similar characters in Chinese characters, coupled with people's fast reading habits, it is inevitable to cause visual misjudgments to a certain extent. [0003] The traditional domain name similarity measurement method can only be applied to the similarity measurement of English domain names, but the effect is not significant for Chinese domain names. Moreover, at present, domestic research on the similarity me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/22
Inventor 龙华祁俊辉邵玉斌杜庆治
Owner KUNMING UNIV OF SCI & TECH