Cross-linguistic plagiarism detection method based on multiple features

A detection method and cross-language technology, which is applied in natural language translation, natural language data processing, special data processing applications, etc., can solve the problems of classification algorithm training complexity, easy over-fitting, etc., to reduce the scope of plagiarism detection, The effect of improving the level of scientific research and avoiding the inaccurate problem of disambiguation
CN107862045AActive Publication Date: 2018-03-30HARBIN ENG UNIV

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Applications(China)
Current Assignee / Owner
HARBIN ENG UNIV
Publication Date
2018-03-30

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a cross-linguistic plagiarism detection method based on multiple features. The method comprises the steps of 1, corpus building; 2, translation feature building, wherein according to the europeanized phenomenon and the translation body problem which generally occur in translated articles, translation feature building is conducted, by means of feature selection, the featuresare cleaned and filtered to obtain the effective features, and noneffective features or the features with unapparent effects are filtered out; 3, feature selection, wherein the effective features areselected from the multiple features for classifier training, and then whether or not the cross-linguistic plagiarism problem exists in a certain article or multiple articles is classified; 4, based onplagiarism detection corresponding to the features, for Chinese features, accurate English feature corresponding is conducted, and according to the translation features and the structural features, plagiarism results are correspondingly filtered and generated, and through WordNet, final confirmation is conducted on the plagiarism results. By means of the method, the cross-linguistic plagiarism problem can be solved according to the multiple kinds of features mined from translation.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to a method for detecting plagiarism in an article. Background technique

[0002] (1) Discovery of Europeanization and translation style problems in English-Chinese translation

[0003] The mutual conversion of English and Chinese has brought subtle changes to both languages, including accent, vocabulary, grammar, rhetoric and other factors. Although the influence of the two languages ​​is mutual, comparatively speaking, the influence of English on Chinese is far greater than that of Chinese on English. When monolingual plagiarism detection is increasingly unable to meet the academic misconduct problems it encounters, then cross-language plagiarism detection appears. However, monolingual plagiarism detection techniques are not suitable for cross-lingual plagiarism detection. Currently, the most mainstream methods for cross-lingual plagiarism detection include Cross-lingual Information Retrieval (CLIR) and Cross-lingual Similari...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More