Method for detecting code similarity based on semantic analysis of program source code

A technology of semantic analysis and detection method, applied in the field of repetitive code detection with similar semantics, can solve the problems of inability to realize large-scale program code similarity detection, high computational complexity, low detection accuracy, etc. The effect of narrowing the search space and improving the accuracy

Inactive Publication Date: 2010-04-21
HARBIN INST OF TECH
View PDF0 Cites 77 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention aims to solve the problems of low similarity detection accuracy, high computational complexity, and inability to realize large-scale program code similarity detection for codes with different grammatical representations but similar semantics existing in existing duplicate code detection methods, thus proposing A Code Similarity Detection Method Based on Semantic Analysis of Program Source Code

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting code similarity based on semantic analysis of program source code
  • Method for detecting code similarity based on semantic analysis of program source code
  • Method for detecting code similarity based on semantic analysis of program source code

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0030] Specific implementation mode 1. Combination Figure 1 to Figure 8 Illustrate this specific embodiment, a kind of code similarity detection method based on program source code semantic analysis, it is finished by the following steps:

[0031] Step 1. Respectively parse the two sections of source code to be detected into two control dependency trees of the system dependency graph;

[0032] Step 2. Perform basic code standardization respectively on the two control dependency trees obtained in step 1, and obtain two control dependency trees after standardization of the basic code;

[0033] Step 3, using the metric method to respectively extract the candidate similar code control dependency trees of the two basic code standardized control dependency trees obtained in step 2;

[0034] Step 4. Judging whether the candidate similar code control dependency tree is extracted, if the judgment result is yes, then perform step 5, if the result is no, then end the similarity detecti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for detecting code similarity based on semantic analysis of a program source code, which relates to computer program analyzing technology and a method for detecting complex codes of computer software. The method solves the prior problems of low similarity detection accuracy and high computing complexity on the codes of different syntactic representations and similar semantemes, and also solves the problem of incapability of realizing large-scale program code similarity detection. The method comprises the following steps: resolving two segments of source codes to be detected into two control dependence trees of a system dependence graph respectively and executing basic code standardization respectively; utilizing a measure method to extract candidate similar code control dependence trees of the control dependence trees which are subjected to the basic code standardization; executing an advanced code standardization operation on extracted candidate similar codes; and computing semantic similarity to obtain a similarity result so as to finish the code similarity detection. The method is applied to source code piracy detection, software component library query, software defect detection, program comprehension and the like.

Description

technical field [0001] The invention relates to computer program analysis technology and a method for detecting repeated codes of computer software, in particular to a method for detecting repeated codes with similar semantics. Background technique [0002] Duplicate code (also known as clone code) detection is an important task in computer software development and maintenance activities, and it has a wide range of applications in source code plagiarism detection, software component library query, software defect detection, program understanding, etc. application. [0003] The existing duplicate code detection methods can be mainly divided into: text-based methods, structural analysis methods, metric-based methods, and similar subgraph-based methods. Among them, the first two methods can only detect codes that are identical or have only minor changes, such as identifier renaming or comment changes. Although the method based on metrics is simple in calculation and low in co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/44G06F17/30
Inventor 王甜甜马培军苏小红王宇颖
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products