Code similarity detection method and device and storage medium

A detection method and similarity technology, applied in the direction of instruments, software maintenance/management, reverse engineering, etc., can solve the problems of not being able to characteristically represent code files, not being able to reflect the special functions of files, and the impact on the accuracy of results, etc., to achieve language migration Low cost, easy to implement, and improved accuracy

Active Publication Date: 2021-03-30
北京北大软件工程股份有限公司
View PDF10 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The large presence of comments and certain lines of code in the first case does not characterize the functionality of the code file and therefore has a large impact on the accuracy of the results
[0011] For the line coverage phenomenon, if the weight of the lines that

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Code similarity detection method and device and storage medium
  • Code similarity detection method and device and storage medium
  • Code similarity detection method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0047] In order to make the objects, technical solutions, and advantages of the present invention, various embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, ordinary skill in the art will appreciate that in various embodiments of the invention, many techniques are proposed in order to better understand the present invention better. However, even without these techniques and variations and modifications based on the following embodiments, the technical solutions required for the present invention can be implemented. The division of the following examples is to describe convenience, and should not be limited to the specific implementations of the present invention, and each embodiment can be referenced to each other without contradictory.

[0048] The present invention will be described in detail below with reference to the accompanying drawings and specific examples.

[0049] In order to eliminate the effects of s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention relates to the field of software detection, and discloses a code similarity detection method. The method mainly comprises three stages: in the preprocessing stage, preprocessing and feature extraction are carried out on massive source code files, and similar hash fingerprint values are output; in the fingerprint indexing stage, according to the result of the previous stage, fingerprints are segmented and recombined by adopting a segmented indexing strategy and then stored in a similar hash fingerprint library, and segmented indexes are established to facilitatequick matching; and in the similarity matching stage, a similar hash value is generated after a to-be-detected engineering file is processed, and a traceability detection result is retrieved in a segmented mode from the similar hash fingerprint library according to the similar hash value of the to-be-detected engineering file. According to the method, influence of the line coverage problem on theresult can be reduced from the perspective of eliminating common lines in different languages.

Description

technical field [0001] The invention relates to the field of software detection, in particular to a code similarity detection method, system, device and storage medium. Background technique [0002] Nowadays, with the increasing popularity of open source software code, the amount of open source code is growing at the speed of light. Whether in enterprises or research institutes, more and more developers choose to copy and paste existing codes to improve software development efficiency. However, as the software is continuously updated and the software functions continue to increase, the negative impact of these repeated codes and cloned codes on software quality, usability and maintainability becomes more and more prominent. Code introduced from open source projects reduces software developers’ understanding and control of the overall software system. Conflicts may arise between foreign codes and the code of the software system itself. Vulnerabilities in open source code may...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F8/75
CPCG06F8/751
Inventor 高庆李玫张世琨马森
Owner 北京北大软件工程股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products