Method and device for calculating similarity of Chinese character strings on the basis of edit distance

A technology of editing distance and Chinese characters, which is applied in calculation, electrical digital data processing, special data processing applications, etc., to achieve the effect of improving accuracy and precision

Inactive Publication Date: 2013-11-20
SHENZHEN AUDAQUE DATA TECH
View PDF5 Cites 49 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the algorithm based on Chinese character shape takes into acc

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for calculating similarity of Chinese character strings on the basis of edit distance
  • Method and device for calculating similarity of Chinese character strings on the basis of edit distance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to make the purpose, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0029] Embodiments of the present invention provide a method and device for calculating similarity of Chinese character strings based on edit distance. The invention converts the Chinese characters in the string into four-corner codes by adopting the four-corner number code, so as to calculate the similarity of the Chinese characters based on the edit distance, on this basis, the similarity of the Chinese characters is used to replace the weight of the edit distance, and then the similarity of the string is calculated. The invention converts Chinese characters into digital strings for compariso...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention provides a method for calculating the similarity of Chinese character strings on the basis of edit distance. The method includes: calculating the similarity of Chinese characters in character strings to be compared; calculating the similarity of the Chinese character strings to be compared. According to the method, the Chinese characters in the character strings are converted into four-corner codes by the four-corner code method; the similarity of the Chinese characters is accordingly calculated on the basis of edit distance; on this basis, the weight of edit distance is replaced with the similarity of the Chinese characters to calculate the similarity of the character strings. The Chinese characters are converted into numeric strings for comparison, thus matching of the Chinese characters is more precise; the weight of the edit distance is replaced with the similarity of the Chinese characters to calculate the similarity of the character strings, thus the edit distance algorithm is applied to matching of the Chinese character strings under the Chinese language environment and matching results are more accurate. In addition, another embodiment of the invention provides a device for calculating the similarity of Chinese character strings on the basis of edit distance.

Description

technical field [0001] The invention relates to the field of similarity of Chinese character strings, in particular to a method and device for calculating the similarity of Chinese character strings based on edit distance. Background technique [0002] The comparison of the similarity of Chinese character strings is a common technique in technical fields such as character string matching, text comparison, and information extraction. In different applications, different technical means of similarity of Chinese strings will be adopted. Common technical means include matching algorithms based on edit distance, matching algorithms based on glyphs and pronunciation, and the smith-Waterman distance algorithm. [0003] In the edit distance algorithm, the steps to find the similarity between two strings are as follows: first, the edit distance matrix should be constructed in advance; second, the value of the matrix unit is calculated from left to right, top to bottom; Third, the c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/53
Inventor 王平贾西贝
Owner SHENZHEN AUDAQUE DATA TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products