Tibetan language thesis copying detection method and Tibetan language thesis copying detection system based on Tibetan language sentence levels

A technology for Tibetan texts and papers, which is applied in the field of Tibetan text copy detection methods and systems, can solve the problems of low detection accuracy and low detection efficiency, and achieve the goal of reducing the number of pairwise comparisons, improving accuracy, and simplifying complex relationships. Effect

Pending Publication Date: 2016-12-14
QINGHAI UNIV FOR NATITIES
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the technical problems of low detection efficiency and low detection accuracy in the existing technology relying on manual detection of plagiarized papers, the purpose of the present invention is to provide a Tibetan sentence-level-based detection method and system for copying Tibetan papers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tibetan language thesis copying detection method and Tibetan language thesis copying detection system based on Tibetan language sentence levels
  • Tibetan language thesis copying detection method and Tibetan language thesis copying detection system based on Tibetan language sentence levels
  • Tibetan language thesis copying detection method and Tibetan language thesis copying detection system based on Tibetan language sentence levels

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to specific embodiments and accompanying drawings.

[0040] 1. Research on copy detection technology of Tibetan papers

[0041] In order to solve the plagiarism phenomenon of Tibetan papers, the present invention proposes a method and system for detecting the copying of Tibetan papers based on the level of Tibetan sentences. , using the cosine similarity algorithm of space vector to calculate the similarity between the sentences of two papers; the key of the algorithm is to select the feature vector, use the feature vector to generate a vector space model, and finally calculate the cosine similarity, and judge according to the obtained similarity value. Whether there is plagiarism in the two papers and the degree of plagiarism. In this method, the stop words in the sentence are eliminated, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Tibetan language thesis copying detection method and a Tibetan language thesis copying detection system based on Tibetan language sentence levels. The Tibetan language thesis copying detection method includes: subjecting Tibetan language text characters to code conversion and noise removal preprocessing; segmenting a text into text blocks according to sentences through boundary identification of Tibetan language sentences and establishing a temporary table of segmented text blocks; extracting and computing text features from a sentence-document inverted index table and the temporary table according to the number of the sentences to obtain sentence similarity; establishing an adjacency list in accordance with the sentence similarity, computing text block similarity and detecting copying of two Tibetan language theses according to a text block similarity value. The Tibetan language thesis copying detection system comprises a Tibetan language thesis copying detection device and a database, wherein the Tibetan language thesis copying detection device is connected to a client terminal server through the Internet, and the database is connected to the server and used for storing Tibetan language theses. The Tibetan language thesis copying detection device comprises a preprocessing module for code conversion and noise removal of the text characters, a temporary table module for constructing the segmented text blocks, an extracting module for constructing sentence text features and a copying detection module for detecting whether the theses have similar copied data or not.

Description

technical field [0001] The invention belongs to the field of Tibetan language information processing, and in particular relates to a method and system for detecting the duplication of Tibetan texts based on Tibetan sentence levels. Background technique [0002] The definition of paper copy detection is to determine whether the content of a paper is plagiarized, plagiarized, or copied from another paper or papers. It mainly includes plagiarism methods such as complete copying, content shifting, synonym substitution, and rephrasing. Copy detection is sometimes called "plagiarism detection" or "plagiarism detection" specifically for academic papers. [0003] With the continuous development of the Internet and the increasing abundance of network digital resources, it has provided people with a convenient platform for resource sharing and information exchange. It has become an important source for people to obtain information, and at the same time provides a convenient academic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06F17/22
CPCG06F16/3332G06F16/951G06F40/151G06F40/289
Inventor 看不太安见才让孙琦龙昝风彪
Owner QINGHAI UNIV FOR NATITIES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products