Code clone detection method based on hash value, electronic device and storage medium

A detection method and hash value technology, which are applied in software engineering design, electrical digital data processing, instruments, etc., can solve the problems of lack of migration, can only detect a specific version of the language, and cannot detect COBOL, and achieve good migration. Effect

Pending Publication Date: 2019-03-08
浙江网新恒天软件有限公司
View PDF7 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to overcome the deficiencies in the prior art, one of the purposes of the present invention is to provide a code clone detection method based on a hash value, which solves the problem that existing code clone detection tools lack the ability to detect codes with identifier differences, and lack the ability to detect The ability to detect code differences in the number of lines cannot be detected for COBOL, it can only run on a specific platform, lacks portability, and can only detect problems with a specific version of the language

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Code clone detection method based on hash value, electronic device and storage medium
  • Code clone detection method based on hash value, electronic device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Below, the present invention will be further described in conjunction with the accompanying drawings and specific implementation methods. It should be noted that, under the premise of not conflicting, the various embodiments described below or the technical features can be combined arbitrarily to form new embodiments. .

[0025] In one embodiment, given a software system that needs to be checked, the set of all source code files it contains is F=[f 1 ,f 2 ,..., f m ], where f i represents the i-th source code file. in f i In the source code file there is f i (j) code snippets [c 1 ,c 2 ,...,c fi(n) ], where code fragment c refers to the smallest granularity code block that needs to detect code cloning, such as a paragraph in COBOL, or a method in JAVA. The total number of all code fragments in a system is represented by m, and its set is C. Code cloning is described in the form of w, a pair of code cloning w=(c 1 ,c 2 ). The set of all code clones in the sys...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a code cloning detection method based on a hash value, comprising the steps of reading a source code, extracting a code base in the source code through an analyzer, and obtaining a cloning detection object; obtaining the anonymous hash value of the clone detection object by processing the clone detection object with the hash algorithm, and obtaining some code clone groupingby clustering the anonymous hash value; comparing the code in the code cloning grouping in pairs within the group to determine whether the respective lines of the two pieces of code or the number of different code lines are less than a threshold value, and if yes, judging the code cloning to be similar, otherwise not judging the code cloning to be code cloning. The present invention relates to anelectronic device and a readable storage medium for performing the method described above. The invention can detect code cloning with identifier difference and code cloning with line number difference, can detect code cloning according to COBOL source code, has no limitation of platform and specific version language, has good portability, and reduces overall complexity by clustering hash value.

Description

technical field [0001] The invention relates to the technical field of code clone detection, in particular to a hash value-based code clone detection method, electronic equipment, and a storage medium. Background technique [0002] Code clone (CodeClone) refers to the same or similar code fragments that appear repeatedly in the software source code. These code fragments may be identical, or may have undergone some editorial (such as modifying variable names) or logical modifications (such as modifying to codes with similar functions but different functions). Code fragments considered to be clones of each other often have similar logical operations and achieve similar functions. Code cloning is generally caused by copy-and-paste code reuse, and may also be caused by reuse software patterns. Code cloning exists in various software systems, especially large-scale software systems. Code cloning is closely related to many problems in software engineering. Code clone detection...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75
CPCG06F8/751
Inventor 陈秋远杨朝晖李善平
Owner 浙江网新恒天软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products