Distributed code clone detection and search method based on sub-block filtering, and system and medium

A search method and distributed technology, applied in the field of distributed code clone detection and search, can solve the problems of low detection level, inability to handle similar code retrieval and recommendation, etc., achieve the goal of improving efficiency, overcoming insufficient detection efficiency, and improving detection efficiency Effect

Active Publication Date: 2020-12-29
NAT UNIV OF DEFENSE TECH
View PDF10 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current code search tools, such as Google Code Search, etc., can provide a lo...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed code clone detection and search method based on sub-block filtering, and system and medium
  • Distributed code clone detection and search method based on sub-block filtering, and system and medium
  • Distributed code clone detection and search method based on sub-block filtering, and system and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0053] like figure 1 As shown, the distributed code clone detection and search method based on sub-block filtering in this embodiment includes:

[0054] 1) Group the user code and the source code of the code library for preprocessing in parallel, divide them into code blocks, convert symbols (Tokens) and count the frequency of Tokens, and obtain an intermediate file containing Tokens and their frequency information;

[0055] 2) Establish a global Token frequency table according to the parallel statistical summary of Token and its frequency information in all intermediate files;

[0056] 3) Use the global Token frequency table to group and process the code blocks of the source code of the code library in parallel, and establish an index to obtain t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed code clone detection and a search method based on sub-block filtering and a system and a medium. The method comprises the steps that user codes and code library source codes are grouped respectively and preprocessed in parallel to obtain an intermediate file containing Token and frequency information of the Token; a global Token frequency table is establishedaccording to all the intermediate files; the code blocks of the code library source code are subjected to grouping and processing in parallel by using a global Token frequency table, and an index is established to obtain an index library; a keyword query index library is extracted from the user code by utilizing a sub-block filtering mechanism to obtain a candidate set corresponding to a code block of the user code; and for each code block in the user code, the similarity between the code block and each candidate code block in the corresponding candidate set is calculated, and if the similarity exceeds a preset value, the result that the code block is a clone pair is determined. The method has the characteristic of language irrelevance detection, is high in detection and search efficiency,is suitable for clone detection and search of large-scale codes, and supports a user to inquire the codes.

Description

technical field [0001] The present invention relates to the technical fields of distributed computing and code clone detection and search, in particular to a distributed code clone detection and search method, system and medium based on sub-block filtering. Background technique [0002] Code cloning refers to directly copying or slightly modifying a piece of code in one software, and then using it in other software as a component of the latter's code. There are many reasons for code cloning, the main source is the cloning performed by developers to reduce the workload, including copying and modifying existing code fragments, using ready-made development framework design patterns, etc. Code cloning is so common that between 5% and 20% of today's code snippets contain duplicated or slightly modified cloned code. Code reuse in high-quality open source software is more common, and more than 50% of the code has been reused. The use of open source components in various software ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F8/75G06F16/31G06F16/33
CPCG06F8/751G06F16/316G06F16/3331
Inventor 任怡杨立明谭郁松汪哲李宝阳国贵黄辰林魏旭鹏周洁陈梓榕王瑞董攀张建锋王晓川丁滟谭霜蹇松雷
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products