Paper text similarity detection method based on citation network

A technology of text similarity and detection method, applied in the field of text similarity detection of papers, which can solve the problems of low recognition rate, low efficiency, and inability to detect intelligent plagiarism.

Active Publication Date: 2019-11-22
PEKING UNIV +1
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the existing paper text similarity detection methods cannot detect intelligent plagiarism, and there are problems of low efficiency and low recognition rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Paper text similarity detection method based on citation network
  • Paper text similarity detection method based on citation network
  • Paper text similarity detection method based on citation network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] like figure 1 As shown in the figure, the method for detecting the similarity of paper texts based on the citation network of the present invention, the specific steps include:

[0047] 1. Citation network extraction or construction, the specific operations are as follows:

[0048]First, the citation network database is searched by the paper title and author. If the paper is in the database, its citation network is directly extracted from the citation network database; if the paper is not in the database, its references are parsed, and then the reference structure is constructed by its citation network (eg figure 2 shown). For example, for document T and author a, the retrieval conditions (T, a) are used to search the citation network, and if (T, a) is not in the citation network database, its references are parsed to produce a citation network. For example, the reference of document T is (T 1 , a 1 ), (T 2 , a 2 ), (T 3 , a 3 ), then respectively (T 1 , a 1...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a paper text similarity detection method based on a citation network. According to the method, candidate document set filtering based on combination of booklist coupling and semantic fingerprints is provided on the basis of a citation network. On the candidate document set, a sentence-level inverted index is established based on the words. The similar sentence detection andviewpoint fragment detection is performed and a similar text is generated. The calculation is performed to obtain a copy ratio of the to-be-detected document so as to judge the similarity of the papertext. According to the sentence comparison and viewpoint detection based on the word vectors, the word vectors and synonyms are introduced into text similarity calculation, the sentence similarity calculation effect is improved. The method has the advantage of being high in calculation speed, and text fragments possibly related to viewpoint plagiarism in the thesis text can be detected. The method is high in speed, and has a good effect on sentence similarity detection in the forms of word replacement, sentence recombination and the like.

Description

technical field [0001] The invention provides a method for detecting text similarity of papers, in particular to a method for detecting text similarity of papers based on a citation network, which belongs to the field of text detection. Background technique [0002] Plagiarism not only violates the basic spirit of scientific research, but also seriously damages the fairness of scientific research and the rights and interests of other personnel. With the further development of the information society, online blogs, databases, etc. make the cost of obtaining information lower and lower, and at the same time make plagiarism more and more convenient. Paper plagiarism mainly refers to editing, piecing together, and revising other people's language, chart formulas or research ideas into one's own paper, and publishing it as one's own work without citing. Therefore, an effective text similarity detection method is needed to deal with plagiarism. [0003] At present, there are two...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62
CPCG06F18/22
Inventor 武山山王继民罗鹏程赵常煜
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products