Multi-language code plagiarism detection method based on pseudo twin network

A twin network and detection method technology, applied in biological neural network models, neural learning methods, software maintenance/management, etc., can solve the problems of redundant code effects, failure to consider code structure features, etc., to improve detection efficiency and widely used The effect of space and accuracy improvement
CN112394973APending Publication Date: 2021-02-23SHANDONG UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
SHANDONG UNIV OF TECH
Publication Date
2021-02-23

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a multi-language code plagiarism detection method based on a pseudo twin network, and the method comprises the steps: 1), obtaining basic data which comprises a pre-training data set and a multi-language code plagiarism detection training data set; 2) preprocessing the pre-training data set to obtain an accurate mark vector; 3) preprocessing the multi-language code plagiarism detection training data set to preliminarily judge whether the code is plagiarism or not; and 4) further judging whether the plagiarism exists in the multi-language code plagiarism detection training data set or not. According to the method, the limitation that code structure characteristics are not considered when codes are taken as texts to be processed in an existing multi-language code plagiarism detection method based on machine learning is broken through; in combination with structural characteristics of codes based on an abstract syntax tree, a convolutional neural network, a bidirectional long-short-term memory artificial neural network and a novel attention neural network are embedded into a pseudo twin network, so that multi-language code plagiarism detection is realized, andthe code plagiarism detection efficiency and precision are effectively improved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of computer program design code detection, in particular to a method for detecting plagiarism of multilingual codes based on a pseudo-twin network. Background technique

[0002] The development of the Internet has made it easier to obtain source code through the Internet, and it has also brought about the problem of code plagiarism. Therefore, source code plagiarism detection technology has been researched by more and more scholars, and code plagiarism detection has important applications in the teaching of current computer programming courses. In recent years, many code plagiarism detection methods have emerged. The existing code plagiarism detection methods are mainly used to detect the similarity between codes in the same language, but the grammatical differences between different programming languages ​​make these methods not suitable for detecting the similarity between codes in different languages; in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More