Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Fast similarity detection and evidence generation for large-scale programs based on code mapping and lexical analysis

A lexical analysis and similarity technology, applied in the field of source code program analysis and software plagiarism detection, can solve problems such as the inability to apply plagiarism detection methods, the failure of plagiarism detection technology, and the synchronization optimization of detection accuracy and time overhead.

Active Publication Date: 2019-03-29
XI AN JIAOTONG UNIV
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1) At present, most source code detection technologies are oriented to small-scale plagiarism detection, which cannot meet the needs of rapid detection in large-scale plagiarism detection scenarios;
[0006] 2) The development of obfuscation technology increases the difficulty of software plagiarism detection, which will make some plagiarism detection technologies invalid;
[0007] 3) Many existing plagiarism detection technologies only provide a simple result, and do not provide specific and strong evidence of plagiarism;
[0008] 4) Existing plagiarism detection technology is difficult to achieve simultaneous optimization of detection accuracy and time overhead, and many plagiarism detection methods cannot be put into application

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fast similarity detection and evidence generation for large-scale programs based on code mapping and lexical analysis
  • Fast similarity detection and evidence generation for large-scale programs based on code mapping and lexical analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0104] The specific implementation of the large-scale program similarity rapid detection and evidence generation method based on code mapping and lexical analysis of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0105] figure 1 It is the overall flowchart of the large-scale program similarity rapid detection and evidence generation method based on code mapping and lexical analysis of the present invention;

[0106] The invention discloses a large-scale program similarity rapid detection and evidence generation method based on code mapping and lexical analysis, comprising the following steps:

[0107] Step S101: Obtain the third-party library call information and word frequency information of each sample program by analyzing the codes of the sample programs in the sample set A to be tested and the source code sample set B; based on the idea of ​​code mapping, call the third-party library of the sample program informati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a fast similarity detection and evidence generation for large-scale programs based on code mapping and lexical analysis. Two-layer similarity detection method is used to detectplagiarism and generate evidence for large-scale software samples. Firstly, the code mapping method is used to analyze the coarse-grained similarity of large-scale programs and search for suspected similar programs quickly. Lexical analysis is then used to fine-grained analyze suspicious similar programs, determine program similarity and generate similar code evidence. Through the above methods,we can quickly and accurately find the plagiarized code in large-scale samples, and provide corresponding evidence to support it.

Description

technical field [0001] The invention relates to the technical field of source code program analysis and software plagiarism detection, in particular to a multi-layer source code program similarity detection method. Background technique [0002] With the rapid development of the computer software industry, more and more researchers, educators and software companies pay more and more attention to the security of software. The open source of computer software has brought more convenient conditions for software plagiarism. In recent years, various software infringement cases have occurred from time to time, and Google, Apple, eBay, etc. have all been involved in related cases. [0003] In order to combat software plagiarism cases and protect software intellectual property rights, researchers at home and abroad have proposed a large number of software plagiarism detection technologies. Based on application scenarios and technical means, existing software plagiarism detection te...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F11/36
CPCG06F11/3608
Inventor 刘烃贾昂徐茜范铭魏闻英楼隽真
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products