A program code similarity quick comparison method based on an abstract syntax tree,

An abstract syntax tree and program code technology, applied in the field of code reuse, can solve the problems of difficulty in satisfying massive code similarity and low efficiency, and achieve the effects of good time and space complexity, high recall rate, and high accuracy rate

Active Publication Date: 2019-03-08
BEIJING INST OF COMP TECH & APPL
View PDF5 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there are many methods and techniques for program code similarity comparison, and some methods also have high accuracy and recall for certain specific scenarios, but these methods are all low in efficiency, and are difficult to meet the requirements for massive The need for code similarity comparisons

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A program code similarity quick comparison method based on an abstract syntax tree,
  • A program code similarity quick comparison method based on an abstract syntax tree,
  • A program code similarity quick comparison method based on an abstract syntax tree,

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to make the purpose, content, and advantages of the present invention clearer, the specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0033] A kind of program code similarity fast comparison method based on abstract syntax tree provided by the present invention, such as figure 1 shown, including the following steps:

[0034] The first step is to build an abstract syntax tree

[0035] The program code similarity comparison is the process of analyzing the program source code. Since the source code is also a text file in essence, if it is directly analyzed, the information obtained is limited, the calculation is too large, and the accuracy is not high. Therefore, before the similarity comparison of the program codes, the source program needs to be converted into an intermediate form for further processing.

[0036] The abstract syntax tree is an inter...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a program code similarity quick comparison method based on an abstract syntax tree, which relates to the technical field of code reuse. The program code similarity quick comparison method based on the abstract syntax tree is characterized by constructing a program abstract syntax tree, extracting program code features based on the abstract syntax tree, hashing code features, and finally judging code similarity by calculating Hamming distance of code feature hash values. This method takes the abstract syntax tree of program code as the comparison object, and combines Simhash and inverted index technology to transform the similarity comparison of program code into the comparison of hash values of code features. Under the premise of ensuring high accuracy and recall rate, this method can not only realize the fast similarity comparison of program code, but also meet the needs of fast similarity comparison for massive code. This method has better time and space complexity, and can be used to compare the similarity of massive codes in application scenarios, thus providing support for software code reuse and traceability.

Description

technical field [0001] The invention relates to the technical field of code reuse, in particular to a fast comparison method for program code similarity based on an abstract syntax tree. Background technique [0002] Code reuse is the use of existing software code components to construct new software systems. Reusable software code components are generally referred to as reusable components. Regardless of whether the reusable code is used intact or after appropriate modification, as long as it is used to construct new software, it can be called reusable components. . [0003] As an important means to improve the efficiency and quality of software development, the development model based on code reuse has become the mainstream of software development. Come to great challenge. The software may contain components or codes of multiple types or sources at the same time, such as component codes developed within the organization, codes developed by software outsourcing, componen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/70
CPCG06F8/70
Inventor 陶金龙冯大成李雅斯高昕睿高艳鹍
Owner BEIJING INST OF COMP TECH & APPL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products