Text similarity detection device

A technology of text similarity and detection method, applied in the field of text similarity detection, can solve the problems of poor actual effect, insufficient application, affecting the similarity calculation effect, etc., so as to save human resources, improve the discrimination accuracy and discrimination speed. Effect

Inactive Publication Date: 2015-06-03
CHINA AGRI UNIV
View PDF5 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, Chinese word segmentation errors also directly affect the subsequent similarity calculation effect
Secondly, due to the lack of a large-scale dictionary like English WordNet, word similarity calculation based on the dictionary is often not widely used in the automatic detection of plagiarism in Chinese papers or the actual effect is not good. How to detect the plagiarism of professional papers well, and college papers are often highly professional and domain-specific

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity detection device
  • Text similarity detection device
  • Text similarity detection device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0051] Step 1. Automatically build a class dictionary according to the classification labels of the network encyclopedia entries;

[0052] Since the classification label of an online encyclopedia entry often gives the upper node of the entry, so according to the classification label information of the entry, all ancestor nodes of the term c can be automatically extracted by using an iterative method.

[0053] In the class dictionary, each term c has a set of j , weight w j >. Each ancestor node p j is the hypernym of term c in the real ontology, and its corresponding weight w j reflects the ancestor node p j The relative relationship with the term c in the real ontology (that is, the relative spatial distance). And, the weight w j The larger , the smaller the spatial distance (that is, the ancestor node p j closer to term c) in the real ontology, and vice versa.

[0054] The following is an automatic construction method for class dictionaries, where the parameter K is t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text similarity detection device. The text similarity detection device comprises the following steps: constructing a thesaurus according to classification labels of Baidu Encyclopedia entries; inputting two Chinese documents needing to be compared, and pre-processing the two Chinese documents respectively; filtering words in the two Chinese documents and removing repeated words to generate a word item set; dividing word items in the word item set into a specialized word set and a common word set; aligning specialized words in two sentences in the two Chinese documents and aligning common words in the two sentences; calculating the similarity, relative to the word with the corresponding property, of each word respectively; and calculating the similarity of each sentence in the two Chinese documents. According to the method, manpower resources are saved to the greatest extent, and the judgment accuracy and the judgment speed of a computer network system to Chinese are improved.

Description

technical field [0001] The invention relates to the field of natural language processing, and more specifically relates to a text similarity detection method. Background technique [0002] With the rapid development of computer technology and the rapid popularization of the Internet, human information exchange has become more and more convenient and fast, which provides great convenience for some people to plagiarize, plagiarize and other immoral behaviors. In particular, in colleges and universities, because teachers do not have enough time and energy to check for plagiarism in essay-style assignments, and at the same time lack effective automatic detection tools for plagiarism, the phenomenon of plagiarism among students is becoming more and more serious. Aiming at this problem, the present invention conducts research on automatic plagiarism detection technology for Chinese paper assignments ("Chinese papers" for short). [0003] In fact, paper plagiarism detection is a d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 陈瑛高万林季烜任延昭张港红
Owner CHINA AGRI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products