Duplicate checking method

A first-step, sentence-based technology, applied in the computer field, can solve the problems of changing the order of words, changing words or adding or subtracting words, long calculation time, and difficulty in checking duplicates, etc., so as to improve the efficiency of checking duplicates and have strong practicability , high accuracy effect

Inactive Publication Date: 2018-10-09
CENT SOUTH UNIV
View PDF1 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, most of the existing methods of document plagiarism check aim at whether the texts are the same or how much they are the same. Most of the repeated comparisons are based on the feature word segmentation or the complete sentence as the minimum unit. There are problems such as long calculation time and difficulty in checking duplicates after replacing them with synonyms. Although the calculation time of the sentence-based duplicate checking calculation method is shortened, it is difficult to find situations such as changing the order of words, changing words, or adding or subtracting words. Therefore, document duplicate checking has a broad scope. space for research and application

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplicate checking method
  • Duplicate checking method
  • Duplicate checking method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] A kind of checking method, comprises the following steps:

[0037] The first step, adopting the Word2Vec model to train and obtain the sentence vectors of the original sentence and the comparison sentence, the acquisition of the sentence vector combines word vectors and part-of-speech vectors;

[0038] The second step, calculate the included angle θ between the sentence vector of the original sentence and the sentence vector of the comparison sentence according to the sentence vector of the original sentence and the sentence vector of the comparison sentence;

[0039] The third step is to judge the similarity between the original sentence and the comparison sentence, specifically: when the angle θ is less than or equal to the threshold, it is determined that the original sentence is similar to the comparison sentence; when the angle θ is greater than the threshold, it is determined that the original sentence is not similar to the comparison sentence .

[0040] The meth...

Embodiment 2

[0054]A kind of checking method, comprises the following steps:

[0055] The first step, adopting the Word2Vec model to train and obtain the sentence vectors of the original sentence and the comparison sentence, the acquisition of the sentence vector combines word vectors and part-of-speech vectors;

[0056] The second step, calculate the included angle θ between the sentence vector of the original sentence and the sentence vector of the comparison sentence according to the sentence vector of the original sentence and the sentence vector of the comparison sentence;

[0057] The third step is to judge the similarity between the original sentence and the comparison sentence, specifically: when the angle θ is less than or equal to the threshold, it is determined that the original sentence is similar to the comparison sentence; when the angle θ is greater than the threshold, it is determined that the original sentence is not similar to the comparison sentence .

[0058] The metho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a duplicate checking method. The method comprises the following steps of the first step, using a Word2Vec model to train to obtain a sentence vector of an original sentence anda contrast sentence, wherein the sentence vector is obtained by integrating a word vector and a word characteristic vector; the second step, based on the sentence vector of the original sentence and the sentence vector of the contrast sentence, calculating to obtain the included angle between the sentence vector of the original sentence and the sentence vector of the contrast sentence; the third step, determining the similarity between the original sentence and the contrast sentence, wherein when the included angle is less than or equal to a threshold, it is determined that the original sentence is similar to the contrast sentence; when the angle is greater than the threshold, it is determined that the original sentence is not similar to the contrast sentence. The method comprehensively considers the word vector and the word characteristic vector, compared with the calculation time based on sentence coding, the calculation time of the method is obviously shortened, the introduction ofthe word characteristic vector has a certain solution effect on the situation that checking is difficult after synonym replacing, the problem that word changing, order changing and adding or deletionof words are difficult to check based on complete sentence comparison is solved, and on the whole, the method not only improves the accuracy of duplicate checking, but also improves the efficiency ofthe duplicate checking.

Description

technical field [0001] The invention relates to the technical field of computers, in particular to a method for checking duplicates. Background technique [0002] As people pay more and more attention to science and technology and social development, the academic field presents a trend of diversification, informationization and modernization. Under such circumstances, we urgently need efficient, comprehensive and convenient academic information more than ever. But on the other hand, cases of academic misconduct still exist. Academic misconduct refers to the atmosphere of falsification, bad behavior or anomie in the academic world, or refers to some people plagiarizing the research results of others in the academic field, corrupting the academic atmosphere, hindering academic progress, and violating the scientific spirit and morality. Academic misconduct abandons the principle of authenticity and integrity of scientific experimental data, which has brought serious negative ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F40/211G06F40/284
Inventor 郑瑾欧丽玲张祖平
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products