Programming language code duplicate checking method based on tree and sequence similarity

A sequence similarity, programming language technology, applied in the field of programming language code checking based on tree and sequence similarity, can solve the problems of high cost, low time and space complexity, low detection accuracy, etc., to improve the checking accuracy and algorithm accuracy. The effect of high efficiency and strong anti-interference ability

Active Publication Date: 2018-07-31
HUAQIAO UNIVERSITY
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Specifically, the detection accuracy of the statistics-based method is low, the method is too abstract, the anti-aliasing ability is very low, the structural characteristics of the program are not considered, and the space complexity is low; the detection accuracy of the Token-based method is low, and its accuracy is mainly Relying on the selection and extraction of Token, its anti-obfuscation ability is low, it is difficult to deal with the implantation of redundant code, it can resist the confusion of replacing variable names, modifying function locations, etc., the time and space complexity is low, mainly based on text structure and lexical analysis ; The detection accuracy of the tree-based method is generally high, and its detection accuracy mainly depends on the degree of refinement of the tree, and its ability to resist confusion is high. The method takes into account the grammatical features, but it is difficult to deal with modifyi

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Programming language code duplicate checking method based on tree and sequence similarity
  • Programming language code duplicate checking method based on tree and sequence similarity
  • Programming language code duplicate checking method based on tree and sequence similarity

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0073] The code duplication checking method is described in combination with program code 1 and program code 2 as follows, and the specific implementation method is as follows:

[0074] Step a, remove the information that interferes with the similarity in the code.

[0075] As shown in Table 2, for the given program code 1 and program code 2 to be checked, remove the comment content in the program, console information, operators and other information, and give the processed results, as shown in Table 3. Given the sequence of variables in program code 1 and program code 2, the structure of the program is preserved.

[0076] Table 2

[0077]

[0078]

[0079] table 3

[0080]

[0081]

[0082] Step b, constructing a program structure tree according to the program structure.

[0083] The result of processing is built tree, in the present embodiment, leaf node is all function, expresses with Fun, see image 3 As shown, there are 6 leaf nodes in program code 1 and p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a programming language code duplicate checking method based on tree and sequence similarity. The method comprises: preprocessing two pieces of to-be-compared program codes, including removing text information such as annotation information, console output statements, operators and the like, and determining valid duplicate checking content; establishing a tree according to the control structure of the program, and recording positions of variables in each leaf node in the tree; establishing a sequence of relative positions of the variables in each leaf node, finding similar variables between functions based on the sequence, and then finding out similar leaf nodes; and finally determining the similarity between the two pieces of codes. According to the method disclosed by the present invention, the influence of some irrelevant information on the check result is removed, and the method has a better duplicate checking effect for the problems of renaming the variables and modifying the function position and the code redundancy; and a corresponding code duplicate checking system can be developed based on the method of the present invention, the code duplicate checking efficiency can be improved, and the system can have a better effect for college computer programming teaching.

Description

technical field [0001] The invention relates to the field of data analysis and processing, in particular to a method for checking duplication of programming language codes based on tree and sequence similarity. Background technique [0002] Existing methods for program code duplication check include methods based on statistics, methods based on Token, methods based on trees and methods based on graphs. Specifically, the detection accuracy of the statistics-based method is low, the method is too abstract, the anti-aliasing ability is very low, the structural characteristics of the program are not considered, and the space complexity is low; the detection accuracy of the Token-based method is low, and its accuracy is mainly Relying on the selection and extraction of Token, its anti-obfuscation ability is low, it is difficult to deal with the implantation of redundant code, it can resist the confusion of replacing variable names, modifying function locations, etc., the time and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F8/75
CPCG06F8/751
Inventor 李海波孙映川林汤权童俊成
Owner HUAQIAO UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products