Cross-linguistic plagiarism detection method based on multiple features

A detection method and cross-language technology, which is applied in natural language translation, natural language data processing, special data processing applications, etc., can solve the problems of classification algorithm training complexity, easy over-fitting, etc., to reduce the scope of plagiarism detection, The effect of improving the level of scientific research and avoiding the inaccurate problem of disambiguation

Active Publication Date: 2018-03-30
HARBIN ENG UNIV
View PDF2 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Under the limited data, the training complexity of the classification algorithm with too many features is too large and it is easy to overfit

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-linguistic plagiarism detection method based on multiple features
  • Cross-linguistic plagiarism detection method based on multiple features
  • Cross-linguistic plagiarism detection method based on multiple features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] For cross-language plagiarism, we should first determine whether there is cross-language plagiarism in an article, find out the articles with cross-language plagiarism, and then determine which passages or parts of the article have cross-language plagiarism. Aiming at the above problems, the present invention mainly discovers and selects effective translation features from Chinese articles with cross-language plagiarism, gives different feature weights, constructs a classification model with cross-language plagiarism, and can classify given Chinese articles , to detect which of the Chinese articles may have plagiarism, and which of the articles does not have plagiarism.

[0037] The present invention aims to solve the problem of cross-language plagiarism by constructing a multi-feature-based cross-language plagiarism detection technology based on multiple features mined from translations. The present invention first analyzes and summarizes the research status of sin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a cross-linguistic plagiarism detection method based on multiple features. The method comprises the steps of 1, corpus building; 2, translation feature building, wherein according to the europeanized phenomenon and the translation body problem which generally occur in translated articles, translation feature building is conducted, by means of feature selection, the featuresare cleaned and filtered to obtain the effective features, and noneffective features or the features with unapparent effects are filtered out; 3, feature selection, wherein the effective features areselected from the multiple features for classifier training, and then whether or not the cross-linguistic plagiarism problem exists in a certain article or multiple articles is classified; 4, based onplagiarism detection corresponding to the features, for Chinese features, accurate English feature corresponding is conducted, and according to the translation features and the structural features, plagiarism results are correspondingly filtered and generated, and through WordNet, final confirmation is conducted on the plagiarism results. By means of the method, the cross-linguistic plagiarism problem can be solved according to the multiple kinds of features mined from translation.

Description

technical field [0001] The invention relates to a method for detecting plagiarism in an article. Background technique [0002] (1) Discovery of Europeanization and translation style problems in English-Chinese translation [0003] The mutual conversion of English and Chinese has brought subtle changes to both languages, including accent, vocabulary, grammar, rhetoric and other factors. Although the influence of the two languages ​​is mutual, comparatively speaking, the influence of English on Chinese is far greater than that of Chinese on English. When monolingual plagiarism detection is increasingly unable to meet the academic misconduct problems it encounters, then cross-language plagiarism detection appears. However, monolingual plagiarism detection techniques are not suitable for cross-lingual plagiarism detection. Currently, the most mainstream methods for cross-lingual plagiarism detection include Cross-lingual Information Retrieval (CLIR) and Cross-lingual Similari...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/28G06F17/27
CPCG06F16/335G06F16/337G06F40/205G06F40/253G06F40/58
Inventor 刘刚胡昱临李光曦
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products