A Method of Duplicate Code Detection Based on Neural Network Language Model
A language model and code detection technology, which is applied in biological neural network models, error detection/correction, software testing/debugging, etc., can solve problems such as economic loss of code creators, failure to detect duplicate codes, etc., to protect intellectual property rights, The effect of avoiding the curse of dimensionality and preventing economic loss
Active Publication Date: 2020-07-28
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF2 Cites 0 Cited by
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
[0004] The purpose of the present invention is to solve the problem that the repeated code detection method in the prior art cannot detect the repeated code that has not undergone essential changes, resulting in the accuracy of the detection, and easily causing economic losses to the code creator. Duplicate Code Detection Method Based on Neural Network Language Model
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View moreImage
Smart Image Click on the blue labels to locate them in the text.
Smart ImageViewing Examples
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0052] Select 260 apps from different Android application markets, each of which is manually analyzed, and the collection of 100 clone codes is also manually determined;
[0053] Use step 1 to convert the code into the corresponding CFG graph;
[0054] For all CFG graphs, use step 2 to obtain the root subgraph of each node;
[0055] Using step 3, the vector representation of each root subgraph is learned;
[0056] Using step 4, the similarity between all CFG graphs is obtained;
[0057] Use step 5 to cluster all the CFG graphs, and the codes corresponding to the CFG graphs in the same cluster are repeated codes. The clustering results are measured by the ARI index, and its value is 0.88.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More PUM
Login to View More Abstract
The invention discloses a repeated code detection method based on a neural network language model, which belongs to the technical field of repeated code detection methods and solves the problem that the repeated code detection method in the prior art cannot detect repeated codes that have not undergone essential changes, resulting in detection The accuracy rate is high, and it is easy to cause economic losses and other problems to the code creator. The present invention includes step 1: converting each code in all codes into a corresponding CFG graph; step 2: extracting the root subgraph of each node in each CFG graph; step 3: expressing all root subgraphs by vectors; Step 4: Input the vector representation of the root subgraph into the depth map kernel function to learn, and obtain the similarity between all CFG graphs; Step 5: Input the similarity between CFG graphs into the AP association clustering algorithm The clustering of the CFG graph obtains multiple clusters, and the codes corresponding to the CFG graphs in the same cluster are repeated codes. The present invention is used to find duplicate codes.
Description
technical field [0001] The invention discloses a repeated code detection method based on a neural network language model, which is used for finding repeated codes and belongs to the technical field of repeated code detection methods. Background technique [0002] From the perspective of software engineering, code cloning can be divided into three types. The first is introduced due to code reuse, which removes some of the repetitive work in software development, and these codes reflect a good design of the software. The second type of duplicated code may lead to software bugs, such as forgetting or incorrectly modifying function names or variable names in the duplicated code. Although the third type of repeated code does not directly cause code bugs, it will have a significant impact on the maintainability of the software later. For example, when designing a system in the MVC mode, if there are duplicate codes between two subsystems, it means that the MVC layered and indepe...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More Application Information
Patent Timeline
Login to View More Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/36G06N3/02
CPCG06F11/3608G06N3/02
Inventor 屈鸿符明晟涂强刘洋军张亦洲王一文高榕陈珊
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA

