Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for detecting suspected code plagiarism based on a random forest model

A random forest model and code technology, applied to computer parts, character and pattern recognition, instruments, etc., can solve the problems of unproven accuracy, low discrimination, and unstable data

Active Publication Date: 2019-01-08
DONGHUA UNIV
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, as a traditional neural network, the BP neural network has only one layer in its hidden layer. In the process of training the feature set, the weight value of each feature output by the neural network tuning cannot achieve the ideal prediction effect. also to be verified
Moreover, among the 7 eigenvalues ​​collected by the BP neural network model of the invention, some data are unstable, such as code style similarity, statistical attribute similarity and other characteristics. These characteristics themselves have relatively small differences in terms of program syntax. In addition, the amount of code required for algorithm practice is small, so in the process of model classification, the final realization of the distinction is not large, and the performance of the model cannot be better played

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for detecting suspected code plagiarism based on a random forest model
  • A method for detecting suspected code plagiarism based on a random forest model
  • A method for detecting suspected code plagiarism based on a random forest model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] In order to make the present invention more comprehensible, preferred embodiments are described in detail below with accompanying drawings.

[0071] A kind of code plagiarism suspicion detection method based on random forest model provided by the present invention, its specific implementation mode is as follows:

[0072] Extract the feature value according to the code of the topic submitted by the students and the relevant topic information, and enter the data preparation stage. When processing each piece of code, the code and comments are separated, and irrelevant information in the beginning and end of the code, such as newline, indentation and space characters, are removed, which makes the processing of later feature values ​​more convenient. Then we extracted nine attributes as the entry point for model training. These nine attributes are: whether the maximum similarity between the student's code and other students' codes exceeds the similarity threshold (CPMS), the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for detecting suspected code plagiarism based on a random forest model. The method can be divided into two stages from a large aspect, the first stage is a feature extraction stage, and the second stage is a model training and prediction stage. The method collects the relevant characteristic data of the two codes to be detected, the characteristic data of the users to be detected and the attributes of the related topics, and modeling is carried out by introducing a random forest algorithm to obtain whether the current users are suspected of plagiarism or not.

Description

technical field [0001] The invention relates to a method for detecting suspected code plagiarism based on a random forest model, which belongs to the application field of machine learning technology. Background technique [0002] Nowadays, with the rapid development of computer technology, a large number of people have poured into the computer industry, and a series of honesty problems brought about by the surge of employees cannot be ignored. As small as programming papers in computer science, as large as key software engineering products, the phenomenon of cloning and plagiarism in program codes is becoming more and more serious. The plagiarism methods mainly include the following: (1) unchanged; (2) modifying comments; (3) modifying identifiers; (4) adjusting variable positions; (5) process combinations; (6) adjusting statement positions; ( 7) Adjust the control structure logic, etc. In addition, some scholars have conjectured and constructed other means of plagiarism i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75G06K9/62
CPCG06F8/751G06F18/22G06F18/24323
Inventor 黄秋波方国正汤景东
Owner DONGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products