Unlock instant, AI-driven research and patent intelligence for your innovation.

A Method of Duplicate Code Detection Based on Neural Network Language Model

A language model and code detection technology, which is applied in biological neural network models, error detection/correction, software testing/debugging, etc., can solve problems such as economic loss of code creators, failure to detect duplicate codes, etc., to protect intellectual property rights, The effect of avoiding the curse of dimensionality and preventing economic loss

Active Publication Date: 2020-07-28
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem that the repeated code detection method in the prior art cannot detect the repeated code that has not undergone essential changes, resulting in the accuracy of the detection, and easily causing economic losses to the code creator. Duplicate Code Detection Method Based on Neural Network Language Model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method of Duplicate Code Detection Based on Neural Network Language Model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] Select 260 apps from different Android application markets, each of which is manually analyzed, and the collection of 100 clone codes is also manually determined;

[0053] Use step 1 to convert the code into the corresponding CFG graph;

[0054] For all CFG graphs, use step 2 to obtain the root subgraph of each node;

[0055] Using step 3, the vector representation of each root subgraph is learned;

[0056] Using step 4, the similarity between all CFG graphs is obtained;

[0057] Use step 5 to cluster all the CFG graphs, and the codes corresponding to the CFG graphs in the same cluster are repeated codes. The clustering results are measured by the ARI index, and its value is 0.88.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a repeated code detection method based on a neural network language model, which belongs to the technical field of repeated code detection methods and solves the problem that the repeated code detection method in the prior art cannot detect repeated codes that have not undergone essential changes, resulting in detection The accuracy rate is high, and it is easy to cause economic losses and other problems to the code creator. The present invention includes step 1: converting each code in all codes into a corresponding CFG graph; step 2: extracting the root subgraph of each node in each CFG graph; step 3: expressing all root subgraphs by vectors; Step 4: Input the vector representation of the root subgraph into the depth map kernel function to learn, and obtain the similarity between all CFG graphs; Step 5: Input the similarity between CFG graphs into the AP association clustering algorithm The clustering of the CFG graph obtains multiple clusters, and the codes corresponding to the CFG graphs in the same cluster are repeated codes. The present invention is used to find duplicate codes.

Description

technical field [0001] The invention discloses a repeated code detection method based on a neural network language model, which is used for finding repeated codes and belongs to the technical field of repeated code detection methods. Background technique [0002] From the perspective of software engineering, code cloning can be divided into three types. The first is introduced due to code reuse, which removes some of the repetitive work in software development, and these codes reflect a good design of the software. The second type of duplicated code may lead to software bugs, such as forgetting or incorrectly modifying function names or variable names in the duplicated code. Although the third type of repeated code does not directly cause code bugs, it will have a significant impact on the maintainability of the software later. For example, when designing a system in the MVC mode, if there are duplicate codes between two subsystems, it means that the MVC layered and indepe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/36G06N3/02
CPCG06F11/3608G06N3/02
Inventor 屈鸿符明晟涂强刘洋军张亦洲王一文高榕陈珊
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More