Method for matching Chinese similarity

A similarity matching, Chinese technology, applied in the field of text similarity matching in search, can solve the problem of editing operation summary

Active Publication Date: 2011-07-13
TSINGHUA UNIV
View PDF2 Cites 47 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

On the other hand, editing operations in Chinese cannot be summarized by simple insertion, deletion, and replacement operations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for matching Chinese similarity
  • Method for matching Chinese similarity
  • Method for matching Chinese similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0040] refer to figure 1 , shows a flow chart of a Chinese similarity matching method of the present invention, the method specifically includes:

[0041] Step S101, obtaining two character strings A and B to be compared;

[0042] According to the requirements in practical applications, obtain the two character strings A and B that need to be compared currently.

[0043] Preferably, the method also includes:

[0044] Create a comparison table Table1 from Chinese characters to Pinyin;

[0045] Create a comparison table Table2 from Chinese characters to Wubi;

[0046] Establish the Chinese character word frequency statistical table Table3;

[0047] Establish the Chinese character error information statistics table Table4.

[0048] In practical applications, the Chinese character pinyin comparison table Table is obtained by establishing a mapping table 1 , Chinese character Wubi comparison table Table 2 , word frequency statistics table Table 3 , error information statist...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for matching Chinese similarity. An edit distance formula and a keyboard fingering rule are used to obtain the edition similarity of the corresponding pinyin of Chinese, namely, whether the Chinese and the pinyin are easily mixed up during edition is reflected; the pronunciation rules of the initial consonant and the final sound of Chinese characters are used for obtaining the initial consonant similarity and the final sound similarity of character strings; and common fuzzy tones in dialects or common pronunciation are combined to calculate the pronunciation similarity among character strings. Because the Chinese character pattern is one of the most important characteristics of Chinese, character pattern coding namely the Five-stroke Method coding is used for calculating the character pattern similarity among character strings; information is collected and calculated at the same time for updating data; and the above similarities are combined to obtain the whole similarity of Chinese word, various factors, such as Chinese spelling custom, user input custom, keyboard layout, mandarin pronunciation rules, dialects, common wrong pronunciation, Chinese character patterns and the like are fully considered, the statistical regularity is combined, and the similarity among Chinese words is comprehensively evaluated.

Description

technical field [0001] The invention relates to the technical field of text similarity matching in search, in particular to a Chinese similarity matching method. Background technique [0002] The similarity function of strings is a function to measure the similarity between two strings. It is a basic technology in string matching (String matching), text comparison (Text Comparison), and information extraction (Information Extraction). Its input Usually two identical or different strings, returning a definite integer value. The higher the similarity between two strings, the greater the corresponding return value. This technology is also widely used in Computational Biology and Signal Processing. [0003] According to different application occasions, there are many classic similarity functions to choose from. For example: Edit Distance (Edit Distance or Levenshtein Distance), which considers three editing operations - insertion (Insertion), deletion (Deletion) and replaceme...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李国良黄维篁冯建华
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products