Code clone detection method and application based on abstract syntax tree and token

An abstract syntax tree and token technology, applied in code compilation, program code conversion, unstructured text data retrieval, etc., to achieve the effects of narrowing the candidate range, improving accuracy, and improving judgment efficiency

Pending Publication Date: 2022-07-29
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a method and application of code clone detection based on abstract syntax trees and tokens, to solve the problem of how to quickly and accurately find code clones

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Code clone detection method and application based on abstract syntax tree and token
  • Code clone detection method and application based on abstract syntax tree and token
  • Code clone detection method and application based on abstract syntax tree and token

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] like figure 1 As shown, a method for code clone detection based on an abstract syntax tree and a token in an embodiment of the present invention is introduced, and the method includes the following steps.

[0033] In step S101, all codes are parsed into tokens and abstract syntax trees.

[0034] Split all code into function-based code blocks, number each code block, calculate the corresponding hash value; parse the token and abstract syntax tree of each code block; and calculate the token corresponding to the code block and token frequency, and calculate the height and width of the abstract syntax tree corresponding to the code block.

[0035] In step S102, code blocks that are not code clones are filtered out by tokens, and candidate blocks that have the same type of code clones as the query block are filtered out by an abstract syntax tree.

[0036] According to the degree of similarity of code clones, code clones are generally divided into 4 types, namely identical...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a code clone detection method and application based on an abstract syntax tree and a token. The method comprises the following steps: analyzing all codes into the token and the abstract syntax tree; non-code cloning code blocks are filtered out through the token, and candidate blocks with the same code cloning type as the query block are screened out through the abstract syntax tree; judging whether the similarity lower limit of the candidate block and the query block is higher than a preset threshold value or not; and if yes, converting the candidate block and the query block into a clone pair and outputting the clone pair. According to the method, the non-code cloned code blocks can be filtered through the tokens of the code blocks and the abstract syntax tree, the candidate range can be narrowed, the judgment efficiency of the similarity between the code snippets when different cloning types are judged is improved, and then the judgment accuracy is improved.

Description

technical field [0001] The present invention relates to the field of code clone detection, in particular to a method and application of code clone detection based on abstract syntax trees and tokens. Background technique [0002] Code clones, also known as duplicate code or similar code, refer to the existence of two or more identical or similar pieces of source code in a code base. There are many reasons for code cloning, mainly the reuse technology used by developers to improve efficiency, including copying and pasting existing code fragments and modifying them, using development frameworks, and reusing design patterns. [0003] A large number of empirical studies have shown that code clones widely exist in various open source and closed source code repositories, and occupy a considerable proportion. For example, some studies have detected 22.3% of code clones in Linux systems, and Kamiya et al. found that in JDK 29% of code clones exist, and in some software systems it e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75G06F8/41G06F16/31G06F16/33
CPCG06F8/751G06F8/4436G06F16/322G06F16/325G06F16/3331
Inventor 刘哲郭欣
Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products