Duplicated code detecting method based on neural network language model

A language model and code detection technology, applied in biological neural network models, error detection/correction, software testing/debugging, etc., can solve the problems of economic loss of code creators, inability to detect duplicate codes, etc., to protect intellectual property rights, The effect of avoiding dimensional disasters and preventing economic losses

Active Publication Date: 2017-10-20
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem that the repeated code detection method in the prior art cannot detect the repeated code that has not undergone essential changes, resulting in the accuracy of the detection, and easily causing economic losses to the code creator. Duplicate Code Detection Method Based on Neural Network Language Model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplicated code detecting method based on neural network language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] Select 260 apps from different Android application markets, each of which is manually analyzed, and the collection of 100 clone codes is also manually determined;

[0053] Use step 1 to convert the code into the corresponding CFG graph;

[0054] For all CFG graphs, use step 2 to obtain the root subgraph of each node;

[0055] Using step 3, the vector representation of each root subgraph is learned;

[0056] Using step 4, the similarity between all CFG graphs is obtained;

[0057] Use step 5 to cluster all the CFG graphs, and the codes corresponding to the CFG graphs in the same cluster are repeated codes. The clustering results are measured by the ARI index, and its value is 0.88.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a duplicated code detecting method based on a neural network language model and belongs to the technical field of duplicated code detecting methods. The problem that duplicated codes unchanged essential cannot be detected by adopting a duplicated code detecting method in the prior art, accordingly the detection accuracy rate is low, and economic losses of code originators are likely caused is solved. The duplicated code detecting method comprises the steps that 1, each of codes is converted into a corresponding CFG image; 2, a root diagram of each node in each CFG image is extracted; 3, all the root diagrams are represented by adopting vectors; 4, the vector representations of the root diagrams are input into a depth diagram-kernel function for learning, and the similarity of all the CFG images is obtained; 5, the similarity of all the CFG images is input into an AP associating and clustering algorithm, CFG image clustering is performed to obtain multiple clustering clusters, and the codes corresponding to the CFG images in the same clustering cluster are duplicated codes. The duplicated code detecting method is used for finding duplicated codes.

Description

technical field [0001] The invention discloses a repeated code detection method based on a neural network language model, which is used for finding repeated codes and belongs to the technical field of repeated code detection methods. Background technique [0002] From the perspective of software engineering, code cloning can be divided into three types. The first is introduced due to code reuse, which removes some of the repetitive work in software development, and these codes reflect a good design of the software. The second type of duplicated code may lead to software bugs, such as forgetting or incorrectly modifying function names or variable names in the duplicated code. Although the third type of repeated code does not directly cause code bugs, it will have a significant impact on the maintainability of the software later. For example, when designing a system in the MVC mode, if there are duplicate codes between two subsystems, it means that the MVC layered and indepe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/36G06N3/02
CPCG06F11/3608G06N3/02
Inventor 屈鸿符明晟涂强刘洋军张亦洲王一文高榕陈珊
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products