Method for comparing Chinese similarity based on context relation

A similarity and context technology, applied in the field of Chinese similarity comparison, can solve the problem of underutilization of context information comparison and so on

Inactive Publication Date: 2012-01-11
BEIHANG UNIV
View PDF4 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Aiming at the problem that the existing VSM-based Chinese similarity comparison method does not make full use of context

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for comparing Chinese similarity based on context relation
  • Method for comparing Chinese similarity based on context relation
  • Method for comparing Chinese similarity based on context relation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The technical solutions of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0019] The Chinese similarity comparison method based on context association of the present invention, such as figure 1 As shown, it specifically includes the following steps:

[0020] Step 1: Read text S and text D to be compared, and perform word segmentation and indexing on the text streams (Text Stream) of the two texts.

[0021] Each word with independent meaning divided from the sentence is called a participle. Since there is no clear boundary between Chinese words and words, Chinese word segmentation is an important basis for machine translation, classification, keyword extraction and information retrieval. The method of the present invention adopts an adaptive Chinese-English word segmentation algorithm based on binary iteration (references: Cao Yonggang, Cao Yuzhong, etc., "Adaptive Chinese Word Segmentati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for comparing Chinese similarity based on a context relation, which is applied to the technical field of Chinese similarity comparison. The method comprises the following steps of: firstly, performing word segmentation and indexing on two texts to be compared, and establishing an inverted list for each text; secondly, performing similarity detection on the invertedlists to obtain suspicious similar segments; and lastly, converging the suspicious similar segments to obtain similar text blocks, wherein context relation is considered in the generating process of the similar text blocks. In the method, small suspicious similar segments are found firstly, and the suspicious similar segments are converged, so that the contradiction between the particle size of aspace vector model and misjudgment as well as judgment missing rate is reduced, and similarity comparison of the two texts is realized.

Description

technical field [0001] The invention relates to the technical field of Chinese similarity comparison, in particular to a context-based Chinese similarity comparison method. Background technique [0002] Chinese similarity comparison technology is widely used in plagiarism detection, information retrieval, machine translation, text mining, webpage deduplication and other fields. Because it is difficult for computers to understand natural language, especially Chinese, it has always been a hot and difficult research topic. [0003] The purpose of the similarity comparison method is to judge whether two texts are "similar". The "similarity" mentioned here should refer to the so-called "similar in form but similar in spirit" at the semantic level. That is, two "similar" articles can still detect their relevance after (1) grammatical structure change; (2) word order exchange; (3) partial word replacement; (4) addition of other content. The similarity depends on factors such as t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 赵长海晏海华郎钰泽
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products