Detection and extraction method of repeated fragments in software codes
A technology of repeating segments and software codes, applied in the field of computer programs, can solve problems such as large amount of calculation, insufficient stability, poor robustness, etc., and achieve the effect of saving calculation amount
Inactive Publication Date: 2017-01-04
UNIV OF SHANGHAI FOR SCI & TECH
View PDF3 Cites 2 Cited by
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
This algorithm is slow, and the effect is not very good, and there are few practical applications
[0008] 5. Method based on code quality measurement (K.Kontogiannis, R.DeMori, E.Merlo, M.Galler, M.Bernstein, Pattern matching for clone and concept detection, Journal of Automated Software Engineering 3(1–2)(1996) 77–108), this method is inefficient and computationally intensive, so it is rarely used in practice
[0009] 6. Index-based (also known as inverted index) approach (Benjamin Hummel, Elmar Juergens, Lars Heinemann, and Michael Conradt. Index-based code clone detection: Incremental, distributed, scalable. In the International Conference on Software Maintenance, pages 1 –9, sept.2010), the index-based method is very efficient, but the only index-based sliding window scheme currently has poor performance and detection quality, is not stable enough, and has poor robustness
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View moreImage
Smart Image Click on the blue labels to locate them in the text.
Smart ImageViewing Examples
Examples
Experimental program
Comparison scheme
Effect test
Embodiment
[0034] In this embodiment, two source codes of files a and b to be detected are used to extract repeated segments.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More PUM

Abstract
The invention provides a detection and extraction method of repeated fragments in software codes. The method comprises steps 1-8 in figures, hierarchical information in a syntax tree is adopted for extraction of code fragments, and grammar information in the code fragments is considered, so that the extracted code fragments are significant. Besides, the extraction process of the code fragments is controlled by a duplicate checking mechanism based on the reverse index; low-level extraction is not performed if high-level repetition is found. Compared with the manner that repetition checking is performed on the smallest fragments and then the fragments are combined in most techniques at present, the extraction method can reduce a large calculated amount. In the process, the size of a detection window can be adjusted automatically according to whether repeated contexts actually exist, the performance is improved, the detection speed is high, and the method can be applied to real scene detection. In addition, the method combines grammatical structure information, so that the misjudgment rate is very low.
Description
technical field [0001] The invention belongs to the field of computer programs, in particular to a method for detecting repeated segments in software codes. Background technique [0002] Code duplication detection is very important in software development. First of all, code duplication detection can improve the maintainability of software. If duplicate codes are allowed to be scattered everywhere, then if one code needs to evolve or undergo defect repair, the code in other places will also evolve or undergo defect repair, which will affect maintainability. Through code duplication detection to find duplication in the code, they can be extracted into functions in time to improve maintainability. Secondly, it can reduce the legal risk in software development. There are different licenses in software development. If the developer accidentally copies infectious license information (such as the GNU license) due to the negligence of the developer, it will bring serious harm to ...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More Application Information
Patent Timeline

IPC IPC(8): G06F11/36
CPCG06F11/3608
Inventor 张刚
Owner UNIV OF SHANGHAI FOR SCI & TECH
Features
- R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
Why Patsnap Eureka
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com