A distributed code clone detection and search method, system and medium based on sub-block filtering

A search method and distributed technology, applied in the field of distributed code clone detection and search, can solve the problems of low detection level, inability to handle similar code retrieval and recommendation, etc., and achieve the effect of improving efficiency, overcoming detection efficiency, and improving detection efficiency

Active Publication Date: 2022-07-05
NAT UNIV OF DEFENSE TECH
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current code search tools, such as Google Code Search, etc., can provide a low level of detection and cannot handle the retrieval and recommendation of similar codes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed code clone detection and search method, system and medium based on sub-block filtering
  • A distributed code clone detection and search method, system and medium based on sub-block filtering
  • A distributed code clone detection and search method, system and medium based on sub-block filtering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

[0053] like figure 1 As shown, the distributed code clone detection and search method based on sub-block filtering in this embodiment includes:

[0054] 1) The user code and the source code of the code base are grouped and preprocessed in parallel, divided into code blocks to convert symbols (Token), and the frequency of the Token is counted to obtain an intermediate file containing the Token and its frequency information;

[0055] 2) Establish a global Token frequency table according to the parallel statistical summary of Token and its frequency information in all intermediate files;

[0056] 3) The code blocks of the source code of the code base are grouped and processed in parallel using the global Token frequency table, and an index is esta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed code clone detection and search method, system and medium based on sub-block filtering. The method of the invention comprises the steps of grouping user code and code base source code in parallel and preprocessing to obtain a code containing Token and its frequency information. Intermediate files; build a global Token frequency table based on all intermediate files; group and process the code blocks of the source code of the code base using the global Token frequency table in parallel to establish an index to obtain an index library; use the sub-block filtering mechanism to extract keywords from user code Query the index library to obtain the candidate set corresponding to the code block of the user code; for each code block in the user code, calculate the similarity between the code block and each candidate code block in the corresponding candidate set, if the similarity If it exceeds the preset value, it will be judged as a clone pair. The invention has the characteristics of being independent of detection language, has high detection and search efficiency, is suitable for clone detection and search of large-scale codes, and supports users to query codes.

Description

technical field [0001] The invention relates to the technical field of distributed computing and code clone detection and search, in particular to a distributed code clone detection and search method, system and medium based on sub-block filtering. Background technique [0002] Code cloning refers to copying a piece of code in one piece of software directly or making a small amount of modification, and then using it in other software as an integral part of the latter's code. There are many reasons for code cloning. The main source is cloning by developers to reduce workload, including copying and modifying existing code fragments, using ready-made development framework design patterns, etc. Code cloning is very common, with 5% to 20% of today's code snippets containing copied or slightly modified cloned code. Code reuse is more common in high-quality open source software, with more than 50% of the code being reused. The use of open source components in various software is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F8/75G06F16/31G06F16/33
CPCG06F8/751G06F16/316G06F16/3331
Inventor 任怡杨立明谭郁松汪哲李宝阳国贵黄辰林魏旭鹏周洁陈梓榕王瑞董攀张建锋王晓川丁滟谭霜蹇松雷
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products