Similarity detection method of computer software source code

A detection method and similarity technology, applied in the direction of program/content distribution protection, etc., can solve the problems of complex execution time, low detection effect and high error rate of some methods.

Active Publication Date: 2016-03-23
北京众码教育科技有限公司
View PDF2 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to solve the problems encountered in the process of detecting code similarity, such as low detection effect, complex and long execution time of some methods, high error rate in some cases, and difficulty in applying to different programming languages, etc., and proposes a computer Software source code similarity detection method, which is based on source code word segmentation processing and block analysis to compare differences to obtain code similarity detection results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similarity detection method of computer software source code
  • Similarity detection method of computer software source code
  • Similarity detection method of computer software source code

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] In order to make the purpose, technical solution and advantages of the invention clearer, the embodiments of the invention will be described in detail below in conjunction with the accompanying drawings. This embodiment is carried out on the premise of the technical solution of the present invention, and the detailed implementation and specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.

[0053] This method is not only aimed at a certain programming language. Here, for the convenience of introducing the specific implementation in detail, two program codes of the Python language are selected as examples.

[0054] consider as figure 2 , image 3 The two codes shown, the code one is the original code, the code two is the similar code, mainly including the following modification methods: (1) copy verbatim (2) change the blank space of the comment statement (3) rename the identifier (4) Chang...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a similarity detection method of a computer software source code, and belongs to the technical field of computer application. The method comprises the following steps: firstly, according to different programming languages, carrying out a word segmentation operation on the source code; then, selecting a specific labeling word to carry out partitioning processing on a word segmentation result, and carrying out relevant processing on a variable segmentation word according to variable attributes; thirdly, on the basis of a word segmentation result, carrying out a difference measurement operation on each block to obtain a difference matrix, and obtaining integral difference according to the difference result and the correlation of each block; and finally, according to a formula, finally obtaining a code similarity detection result. Compared with the prior art, the method can successfully identify means including word-for-word copying, comment statement blank area change, identifier renaming, data type change and the like in the similarity detection of the code, and can successfully detect means that a code block sequence is changed, a statement sequence is changed, redundant statements and variables are increased, an original control structure is replaced with an equivalent control structure and the like.

Description

Technical field: [0001] The invention relates to a computer program analysis technology and a code similarity detection algorithm of computer software, in particular to a code similarity detection algorithm based on source code word segmentation and block extraction processing and multiple difference measurement methods that can be expanded, belonging to computer application technology field. Background technique [0002] Code similarity detection technology is currently mainly used in code plagiarism detection, which is an important task in computer software development and maintenance activities, in many fields such as source code plagiarism detection, software component library query, software defect detection, program understanding, etc. has wide application. It can not only help teachers to detect plagiarism of students' program assignments, but also has good practical significance for the identification of software copyright. [0003] In the article "Metrics based pl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/16
CPCG06F21/16
Inventor 嵩天田星李凤霞刘政祎
Owner 北京众码教育科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products