Synonym expansion method and device both used for text duplication detection

An extension method and a technology of synonyms, which are applied in the field of synonyms extension, can solve the problems of affecting the detection efficiency and the expansion set is too large, and achieve the effect of improving efficiency and affecting the improvement

Inactive Publication Date: 2012-08-29
孙星明
View PDF6 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method overcomes the problem that in the expansion of synonyms, the expa...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Synonym expansion method and device both used for text duplication detection
  • Synonym expansion method and device both used for text duplication detection
  • Synonym expansion method and device both used for text duplication detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions proposed in the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0024] The first step of the embodiment of the present invention is text preprocessing, refer to figure 1 Including the following steps:

[0025] Step 1: For suspicious texts, use existing natural language processing tools to segment them.

[0026] Step 2: Remove the stop words in the suspicious text through the stop word list.

[0027] Step 3: Annotate the verbs, nouns and adjectives in the above-mentioned processed text through existing natural language processing tools.

[0028] For a given suspicious text, after the above preprocessing steps, the text is obtained.

[0029]

[0030] refer to figure 2 , for the text obtained after processing, carry out synonym expansion. In this process, because context inf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a synonym expansion method and a synonym expansion device both used for text duplication detection, which include a text preprocessing unit used for deleting stop words in a suspected text and tagging the part-of-speech, wherein verbs, nouns, and adjectives are taken as the to-be-processed objects; through retrieving synonyms of single words, computing the Cartesian product and obtaining the initial expansion set of all word collocations in the suspected text; through comparing the initial expansion set and an actual corpus, filtering word collocations impossible in an actual language environment, simplifying the set, and obtaining the final expansion set; and during the duplication detection, according to different collocation results, giving the words different weights which are taken as the computation base for the duplication detection results. Through applying the method or the device disclosed by the embodiment of the invention, the problem of synonym replacement in text duplication can be efficiently overcome, the efficiency is higher, and the accuracy of the duplication detection is greatly improved.

Description

technical field [0001] The invention generally relates to the technology of synonym expansion in text duplication detection, and especially designs a method and device which can prevent the expansion set from being too large in the process of synonym expansion. Background technique [0002] With the rapid development of computer technology and the Internet, and the massive growth of digital information, how to prevent digital information from being illegally copied and disseminated has become an urgent problem to be solved. Among these digital information duplications, textual duplications are the most prevalent. The purpose of text duplication detection is to find the plagiarized part of the text by comparing the suspicious text with the specified corpus. This comparison method has a better effect on direct copying of text. But it can't do anything about the phenomenon of synonym replacement in the text. Aiming at this phenomenon, some duplication detection methods intro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 孙星明
Owner 孙星明
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products