Multi-thread program plagiarism detection method based on frequent pattern mining

A technology of frequent patterns and detection methods, applied in the direction of program/content distribution protection, etc., to achieve the effect of reducing interference

Active Publication Date: 2019-12-03
XIAN UNIV OF POSTS & TELECOMM
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the uncertainty of thread interweaving makes the behavior of multi-threaded programs also show great uncertainty, which leads to great randomness in the analysis of multi-threaded programs by traditional dynamic birthmark technology.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-thread program plagiarism detection method based on frequent pattern mining
  • Multi-thread program plagiarism detection method based on frequent pattern mining
  • Multi-thread program plagiarism detection method based on frequent pattern mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] Example 1: Assume that program p is at some input I 1 The following has been executed twice, and the execution trajectories after filtering by interference items are and Among them, respectively use I 1 1 and I 1 2 Indicates that the program is inputting I 1 The first execution and the second execution under.

[0070] Assuming that the value of τ is 3 (only for the convenience of expression, the actual value is preferably between 8-10), then, for the trajectory of the first execution After processing in step S303, the pattern candidate set (allowing elements to be repeated) of the execution track is obtained as follows:

[0071]

[0072] Similarly, the trajectory corresponding to the second execution The pattern candidate set for is:

[0073]

[0074] Then, program p at some input I 1 The set of schema candidates (allowing element repetitions) under is:

[0075]

[0076] Step S103: use the frequent pattern mining algorithm to process the candidat...

Embodiment 2

[0083] Embodiment 2: Assume that according to step S401, for the program p in embodiment 1 in a certain input I 1 After processing the pattern candidate set below, the obtained frequent pattern set is:

[0084]

[0085] Further according to the flow described in steps S401-S405, obtain the program p in a certain input I 1 The thread-aware birthmark under:

[0086]

[0087] Step S104: given the plaintiff program p and the defendant program q under the input I thread sense birthmark and The formula (1) is used to realize the calculation of birthmark similarity:

[0088]

Embodiment 3

[0089] Example 3: For another program q, assume that at input I 1 The frequent pattern extracted according to steps S101-S103 is The resulting thread-aware birthmark is:

[0090] Then, program q and program p in Example 2 are input I 1 The similarity of the software birthmark under is:

[0091]

[0092] Step S105: The dynamic birthmark is related to the input, which is an abstraction of the semantics and behavior of the program under a specific input, and the judgment result made only by a single input is not reliable. In this regard, multiple different inputs are provided and steps S101-S104 are repeated to sequentially obtain the similarity of the birthmarks of the plaintiff and defendant programs under corresponding inputs, and take the mean value of the similarity as the similarity of the plaintiff and defendant programs. Specifically, formula (2) is used to calculate program similarity:

[0093]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-thread program plagiarism detection method based on frequent pattern mining. The method comprises the following steps: 1) dynamically monitoring to obtain a plurality of execution tracks of a program under the condition of multiple times of execution of the same input; 2) preprocessing the program execution trajectory set to generate a mode candidate set; 3) utilizing a frequent pattern mining algorithm to process the pattern candidate set, generating a frequent pattern set, performing Hash processing, and then constructing a thread perception birthmark; 4) calculating the similarity between the original announcement program birthmarks and the announced program birthmarks under specific input; and 5) making plagiarism judgment and outputting a detection result based on the mean value of the birthmark similarity under the plurality of inputs and a given threshold value. According to the method, the executable program is directly used as an analysis object, and program source codes are not needed; according to the method, frequent pattern mining is utilized to extract a behavior pattern from a plurality of execution tracks corresponding to multiple times of program operation under the same input to generate a thread perception birthmark, so that the interference of thread interleaving uncertainty is greatly reduced.

Description

technical field [0001] The invention belongs to the technical field of program execution trajectory analysis and software plagiarism detection, and in particular relates to a multi-threaded program plagiarism detection method based on frequent pattern mining. Background technique [0002] In recent years, with the vigorous development of open source software communities such as GitHub and SourceForge, the software industry has achieved unprecedented prosperity. The problem of software plagiarism has also become more and more serious, and it is not uncommon to abuse other people's code. On the one hand, there is no lack of premeditated plagiarism driven by economic interests, such as the recent "Red Core Disturbance" incident, which claimed that the Red Core browser, which independently developed a domestic kernel, was only a simple package of Google Chrome browser; in addition, many large Software companies often integrate some software components from upstream companies in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/14
CPCG06F21/14
Inventor 田振洲王清高聪王忠民陈彦萍张恒山
Owner XIAN UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products