Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Source code fragment pairwise comparison method based on coding sequence representation

A coding sequence and pairwise comparison technology, applied in the field of computer programs, can solve the problems of inaccurate positioning of high-similarity fragments and inability to support cross-granularity similarity matching, and achieve good applicability and matching performance

Active Publication Date: 2021-02-26
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of this invention is to provide a pairwise comparison method of source code fragments based on coding sequence representation, which solves the problem that the existing source code similarity matching algorithm cannot support Cross-granularity similarity matching, high similarity segment positioning is not accurate enough technical problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Source code fragment pairwise comparison method based on coding sequence representation
  • Source code fragment pairwise comparison method based on coding sequence representation
  • Source code fragment pairwise comparison method based on coding sequence representation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order for those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and detailedly described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described implementation Examples are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0033] Such as Figure 1-Figure 10 As shown, a pairwise comparison method of source code fragments based on coding sequence representation, including the following steps:

[0034] Step 1: Establish a source code database for storing source code text;

[0035] Establish a code conversion module, and use the code sequence source code representat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a source code fragment pairing comparison method based on coded sequence representation, and belongs to the technical field of computer programs. The method includes: using a coded sequence source code representation method based on static program analysis to convert a source code text into coded sequence representation; performing data processing on the coded sequence of the source code segment by using Burrows-Wheeler conversion to obtain an index of the coded sequence; through seed screening, finding out subsequence alignment seeds with high similarity from indexes of the coded sequences; using a Smith-Waterman algorithm to take the high-similarity seeds as initial positions of subsequence alignment, and expanding subsequences with a certain similarity thresholdvalue in subsequent sequences; positioning the high-similarity part between the source code fragments according to the source code line number information corresponding to the coding sequence. The technical problems that cross-granularity similarity matching cannot be supported and high-similarity fragment positioning is not accurate enough are solved, cross-granularity source code similarity comparison can be supported, and source code texts which do not need to be compared are required to have the same granularity.

Description

technical field [0001] The invention belongs to the technical field of computer programs, and relates to a pairwise comparison method of source code fragments based on coding sequence representation. Background technique [0002] Source code similarity detection is widely used in many software development tasks, for example, code plagiarism and duplicate code detection through clone detection, software fault location through similarity matching, code recommendation or Generate fixes etc. In these tasks, a source code similarity matching algorithm is needed to retrieve and quantitatively analyze similar codes. [0003] Commonly used code similarity calculation methods usually use text, symbols, tree structures or graph structures to represent the source code text, and then use the corresponding similarity definition to calculate the similarity between two pieces of source code. The text-based method is to use the source code text as a string sequence or set for text matchin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75
CPCG06F8/751
Inventor 黄志球喻垚慎李伟湋沈国华邵宜超
Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products