Source code multi-tag graph neural network-based program code copying type detection method and system

A technology of neural network and program code, which is applied in the direction of biological neural network model, neural architecture, program/content distribution protection, etc. It can solve the problems of complex representation in the middle of the code, full of subjective colors, plagiarism of two codes, etc., and achieve high accuracy rate effect

Active Publication Date: 2018-08-24
SUN YAT SEN UNIV
View PDF7 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The program code has strong structural characteristics, and the detection of a single technology can only target one or two kinds of plagiarism methods, and each type of technology has its own technical characteristics and advantages and disadvantages. There are different performances in terms of space-time comp

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Source code multi-tag graph neural network-based program code copying type detection method and system
  • Source code multi-tag graph neural network-based program code copying type detection method and system
  • Source code multi-tag graph neural network-based program code copying type detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] Such as figure 2 As shown, the program code plagiarism type detection method based on the source code multi-label graph neural network provided by the present invention includes the following steps:

[0061] S1. For a code text, use a custom code micro-obfuscation tool to generate a plagiarized version for it, and record the type of plagiarism at the same time;

[0062] S2. Extracting the feature vector of the code attribute graph from the code text and its plagiarized version;

[0063] S3. Integrate the code text and its plagiarized version of the code attribute map feature vector to provide a good input for the neural network, so that the integrated code text and its plagiarized version of the code attribute graph feature vector are positive examples;

[0064] S4. Using the methods of steps S2 to S3 to integrate and obtain the code text-code property map feature vector of the code text, making it a counterexample;

[0065] S5. Use the neural network to define a mul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a source code multi-tag graph neural network-based program code copying type detection method. The method comprises the following steps of S1, for a code text, generating a copying version by utilizing a self-defined code micro-confusion tool, and recording a copying type; S2, performing code attribute graph eigenvector extraction on the code text and the copying version;S3, integrating code attribute graph eigenvectors of the code text and the copying version, providing a good input for a neural network, and setting integrated code attribute graph eigenvectors of thecode text and the copying version as positive examples; S4, performing integration by utilizing the methods in the steps S2-S3 to obtain code attribute graph eigenvectors of the code text-the code text, and setting the code attribute graph eigenvectors of the code text-the code text as negative examples; and S5, defining a multi-task learning network model by utilizing the neural network, training 10 classifiers at the same time for each positive example/negative example, and finally outputting a 10-dimension vector, thereby providing a copying evidence for an assessor, wherein each dimensionrepresents a defined copying type.

Description

technical field [0001] The invention relates to the field of code plagiarism detection, and more specifically, to a method and system for detecting program code plagiarism types based on a source code multi-label graph neural network. Background technique [0002] Program code plagiarism means that the plagiarist's program code is obtained by directly copying other people's source code or by slightly modifying other people's source code; code plagiarism detection refers to extracting code feature strings or fingerprints, and then using a certain matching algorithm To calculate the similarity of two codes, plagiarism forensics refers to the process of plagiarism detection, recording the possibility of a certain method of plagiarism, as a reference for suspected plagiarism. With the continuous development of information technology, the practicability and necessity of program code plagiarism detection in some specific occasions has become increasingly significant, especially in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/12G06K9/62G06N3/04
CPCG06F21/125G06N3/04G06F18/24
Inventor 万海刘欣怡
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products