Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

approximate code cloning detection method adopting GPU acceleration

A code and detection efficiency technology, applied in the field of approximate code clone detection, can solve problems such as taking 4.5 days, maintaining a high level, detection accuracy and performance difficulties

Active Publication Date: 2019-05-31
FUDAN UNIV
View PDF10 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In 2016, researchers in the United States and Canada developed the SourcererCC[3] clone detection tool, which uses code word bagging technology and heuristic rules to detect approximate clones of codes with a given boundary in hundreds of millions of lines of code. An approximate clone of instrumented method granularity in 250 million lines of code still takes 4.5 days
In 2017, researchers developed CloneWorks[4], which uses a variety of strategies to improve the detection efficiency of large-scale codes. It is currently the fastest approximate clone detection tool. However, this tool provides two different operating strategies: conservative and aggressive. It is difficult to maintain a high level of detection accuracy and performance at the same time
However, this method is mainly used in biological fields such as gene sequence analysis, and has not been used in the field of code analysis for clone detection of source code.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • approximate code cloning detection method adopting GPU acceleration
  • approximate code cloning detection method adopting GPU acceleration

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Through the following description of the embodiments of the present invention combined with the accompanying drawings, the purpose, specific operation methods and advantages of the present invention can be further understood.

[0031] figure 2 An example of the procedure for detecting type III close clones in the three methods m1, m2, m3. The present invention labels all the source codes (including methods m1, m2, m3) to form a label string ① established in order by all the labels in the code, wherein the methods m1, m2, m3 are represented as three strips. Subsequently, the present invention constructs the label string into a suffix array ② by adopting GPU parallel acceleration according to character strings. Wherein, each element n represents a suffix string, and the suffix string starts from the nth tag of the tag string to the end of the tag string of the entire code. If the suffix strings represented by two adjacent elements have the same prefix (that is, several...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of software code analysis, and particularly relates to an approximate code cloning detection method accelerated by a GPU. According to the invention, source codes of a plurality of software projects are labeled; converting into a string consisting of a tag sequence; constructing the string into a suffix array sorted according to suffix character stringsby adopting a GPU (Graphics Processing Unit) parallel method; and finally, the code cloning fragments which are smaller than the code cloning length threshold but not smaller than the candidate cloning length threshold are sequenced through a GPU parallel acceleration method according to the position in the code file, and whether the cloned fragments can meet the similarity threshold requirementof approximate cloning or not after being merged is calculated. And if the code similarity meets the threshold requirement, recording the code similarity as approximate code cloning. According to themethod, the approximate code clone can be rapidly detected on the code fragment granularity in mass codes.

Description

technical field [0001] The invention belongs to the technical field of software code analysis, and in particular relates to an approximate code clone detection method accelerated by GPU. Background technique [0002] Code clone detection is a technique to find duplicate identical or similar code fragments from software source code. As early as the 1990s, researchers noticed the repetitive nature of software code. Code duplication or code cloning generally falls into four types. Type I (Type-1) clone refers to the exact same code fragment. Type II (Type-2) clones are parameterized similar codes that are identical except for identifiers, constants, and types. Type III (Type-3) clones refer to similar codes that have additions, deletions, or other modifications beyond Type II clones. Type IV (Type-4) cloning refers to codes with similar semantics, that is, the grammatical structure may be completely different, but the codes have similar semantics. Among the code clone dete...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75
Inventor 吴毅坚彭鑫
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products