Unlock instant, AI-driven research and patent intelligence for your innovation.

A Cross-lingual Plagiarism Detection Method Based on Fingerprint Fusion

A detection method and cross-lingual technology, applied in natural language data processing, semantic analysis, digital data information retrieval, etc., can solve problems such as plagiarism and plagiarism

Active Publication Date: 2021-03-30
HARBIN ENG UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it also has disadvantages. Since digital fingerprint technology selects continuous text to generate fingerprints, it can generally only solve plagiarism problems such as copy and paste, but it is not very good for intelligent plagiarism such as paraphrase, synonym replacement, and disordered order. OK, detected, so there is still room for improvement and research

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Cross-lingual Plagiarism Detection Method Based on Fingerprint Fusion
  • A Cross-lingual Plagiarism Detection Method Based on Fingerprint Fusion
  • A Cross-lingual Plagiarism Detection Method Based on Fingerprint Fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] The following examples describe the present invention in more detail.

[0070] 1. Text preprocessing

[0071] Text preprocessing includes word segmentation technology, part-of-speech tagging, stop word removal, etc. English text needs root restoration, and due to the complexity and polysemy of Chinese, and there are no segmentation marks like spaces in English text, only punctuation marks Segmentation makes the preprocessing of Chinese text more complicated, and the accuracy of text preprocessing also has a great impact on the subsequent experimental results. The Chinese text and the English text need to be preprocessed separately to obtain the noun sequence.

[0072] Input: text information to be analyzed

[0073] Output: Chinese and English feature sets

[0074] Step 1: Chinese text preprocessing. The Chinese text is preprocessed using the Chinese lexical analysis system ICTCLAS of the Chinese Academy of Sciences, and the program directly calls the API of ICTCLAS ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a cross-language plagiarism detection method based on fingerprint fusion. Noun sequences of Chinese and English text sets on which plagiarism detection needs to be carried out are extracted through natural language processing, and tree-shaped noun structures of WordNet are utilized for encoding the noun sequences as intermediate fingerprints through an intermediate-fingerprint encoding algorithm; then semantic density is utilized for carrying out semantic disambiguation on fingerprint codes on the basis of the intermediate fingerprints; Chinese and English fingerprints which can represent current-segment semantics are extracted through a fingerprint selection strategy, a Dice coefficient is used for carrying out similarity degree calculation on the fingerprints, anda calculation result is utilized for selecting potential plagiarism segments according to a threshold value; and then similarity degrees between sentences are calculated according to a SinWin algorithm, plagiarism sentences are selected through a threshold value, and finally, a final plagiarism detection result is formed through plagiarism segment merging. The method spans barriers of languages ina cross-language similarity retrieval phase, and is suitable and highly efficient for longer paragraphs.

Description

technical field [0001] The invention relates to a cross-language plagiarism detection method. Background technique [0002] Plagiarism detection is to judge whether the content, ideas and ideas of a document are plagiarized, plagiarized or copied from other documents, generally including complete plagiarism, synonym replacement, modification plagiarism, translation plagiarism, viewpoint plagiarism, etc. External plagiarism detection evaluates against one or more source documents, internal plagiarism detection, that is, without source documents, detects by changes in writing style, and cross-lingual plagiarism detection must take language differences into account. [0003] 1. External plagiarism detection [0004] External plagiarism detection is given a suspicious text, and retrieves documents whose similarity with the suspicious text is greater than a certain threshold from the source document collection. The general system flow of external plagiarism detection is as foll...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/335G06F16/35G06F40/30G06F40/289G06F21/10
CPCG06F21/10G06F16/3344G06F16/335G06F16/35G06F40/289G06F40/30
Inventor 刘刚左权杨倩茹安立桐
Owner HARBIN ENG UNIV