Multi-language code plagiarism detection method based on pseudo twin network

A twin network and detection method technology, applied in biological neural network models, neural learning methods, software maintenance/management, etc., can solve the problems of redundant code effects, failure to consider code structure features, etc., to improve detection efficiency and widely used The effect of space and accuracy improvement

Pending Publication Date: 2021-02-23
SHANDONG UNIV OF TECH
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and propose a multilingual code plagiarism detection method based on a pseudo-twin network, which

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-language code plagiarism detection method based on pseudo twin network
  • Multi-language code plagiarism detection method based on pseudo twin network
  • Multi-language code plagiarism detection method based on pseudo twin network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be further described below in conjunction with specific examples.

[0040] Such as figure 1 As shown, the pseudo-twin network-based multilingual code plagiarism detection method provided in this embodiment pre-trains the pre-trained data to obtain accurate label vectors, and performs redundancy on the training set of the multilingual code plagiarism detection training data set processing, converting it into an abstract syntax tree and preliminarily judging whether the code is plagiarized, and then further judging whether the data set is plagiarized, by traversing the abstract syntax tree of the code in depth to form a tag sequence representing the code, and using each sequence of the tag sequence Replace the pre-trained tag vector to form an embedding matrix, and then use the pseudo-twin network to further detect and judge whether the code is plagiarized, which includes the following steps:

[0041]1) Obtain basic data, using open source data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-language code plagiarism detection method based on a pseudo twin network, and the method comprises the steps: 1), obtaining basic data which comprises a pre-training data set and a multi-language code plagiarism detection training data set; 2) preprocessing the pre-training data set to obtain an accurate mark vector; 3) preprocessing the multi-language code plagiarism detection training data set to preliminarily judge whether the code is plagiarism or not; and 4) further judging whether the plagiarism exists in the multi-language code plagiarism detection training data set or not. According to the method, the limitation that code structure characteristics are not considered when codes are taken as texts to be processed in an existing multi-language code plagiarism detection method based on machine learning is broken through; in combination with structural characteristics of codes based on an abstract syntax tree, a convolutional neural network, a bidirectional long-short-term memory artificial neural network and a novel attention neural network are embedded into a pseudo twin network, so that multi-language code plagiarism detection is realized, andthe code plagiarism detection efficiency and precision are effectively improved.

Description

technical field [0001] The invention relates to the technical field of computer program design code detection, in particular to a method for detecting plagiarism of multilingual codes based on a pseudo-twin network. Background technique [0002] The development of the Internet has made it easier to obtain source code through the Internet, and it has also brought about the problem of code plagiarism. Therefore, source code plagiarism detection technology has been researched by more and more scholars, and code plagiarism detection has important applications in the teaching of current computer programming courses. In recent years, many code plagiarism detection methods have emerged. The existing code plagiarism detection methods are mainly used to detect the similarity between codes in the same language, but the grammatical differences between different programming languages ​​make these methods not suitable for detecting the similarity between codes in different languages; in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F8/70G06N3/04G06N3/08
CPCG06F8/70G06N3/049G06N3/08G06N3/045
Inventor 刘聪李国繁张峰李会玲李彩虹王绍卿
Owner SHANDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products