Magnetic disk sensitive information scanning system based on AC string matching parallel algorithm of MPI

A technology of sensitive information and parallel algorithm, applied in the field of disk sensitive information scanning system

Inactive Publication Date: 2017-08-29
HARBIN UNIV OF SCI & TECH
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] The purpose of the present invention is to propose a method to effectively solve the edge matching problem of the data block in the process of realizing the parallelization of the AC algorithm, and realize the method in the disk sensitive information scanning system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Magnetic disk sensitive information scanning system based on AC string matching parallel algorithm of MPI

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0119] Example 1: The matching problem of AC algorithm data block edge.

[0120] The data parallelism of the AC algorithm is to evenly distribute the scanned text to N parallel processes for processing. However, the algorithm needs to solve the matching problem at the edge of the data block. Suppose the given pattern set is: {de, ef}, and the given text is: abcdefgh. A data block is divided into 2 parts, namely, Block_1={abcd} and Block_2={efgh}. Two data blocks are assigned to parallel processes Thread_1 and Thread_2 for execution. The execution of the AC algorithm in Thread_1 did not find the pattern, and the execution of the AC algorithm in Thread_2 found the pattern "ef". Apparently, "de" at the edge of data blocks Block_1 and Block_2 is allocated into two data blocks, and such data division causes pattern "de" to be missed. Therefore, the data parallelism of the AC algorithm has a serious "data block edge matching" problem.

[0121] In actual use, due to the neglect ...

example 2

[0122] Example 2: An example of a disk sensitive information scanning system based on the MPI-based AC string matching parallel algorithm.

[0123] Assume that the pattern string collection file is ExamplePattern. The maximum length of the pattern string MaxPattern=10. The user sets the directory to scan the disk as UserSetDir. Set the maximum length of the data block of the scanned file MaxBlock=300. The currently scanned file name is "mydata.txt", and the total number of file data blocks is N_Block=3; the scanned file traverses the queue Q_File, traverses the directory tree hierarchically, and inserts the leaf nodes into the Q_File queue.

[0124] The main process of the system is:

[0125] Initialize the MPI operating environment, load the MPI system function library, and MPI version information, assuming that the system core processors are equal to 4, that is, N_Procs=4. Master is the master process, and Slave_1, Slave_2, and Slave_3 are slave processes.

[0126] The ...

example 3

[0142] Example 3: Example of function BlockPart().

[0143] Suppose the length of the file File is Len_File=100.

[0144] The implementation process of BlockPart() is as follows:

[0145] If Len_File=100 is less than MaxBlock, set N_Block=1 in the data block information table Block_Info. Such a design can prevent files that are too small from being divided into blocks, and avoid waste of parallel resources.

[0146] Suppose the length of the file File is Len_File=500, MaxBlock=200, MaxPattern=10.

[0147] The implementation process of the function BlockPart() is:

[0148] Define the current location of the file as CurLoc=0, set the initial value of N_Block of the data block information table Block_Info equal to 1, and initialize the data item FileName of the data block information table Block_Info.

[0149] In step S3, (CurLoc+MaxBlock)

[0150] And (Len_File–CurLoc–MaxBlock)>MaxBlock, it can be obtained: (500-0-200)=300>200, so it can be obta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a magnetic disk sensitive information scanning system based on an AC string matching parallel algorithm of an MPI. The system comprises the first step of obtaining information of a mode character set, an assigned scanning catalog, file data sub-blocks and an AC algorithm automatic machine; the second step of building an MPI execution environment of a system main progress in a multi-core processor framework, dynamically inquiring a working state of a processor, and allocating data blocks to a subsidiary progress to conduct parallel lookup on data sensitive information; the third step of conducting parallel execution on a determined finite automatic machine matching algorithm in the subsidiary progress of the multi-core processor, recording the position of the sensitive information, and dynamically reporting the working state of the processor. According to the magnetic disk sensitive information scanning system based on the AC string matching parallel algorithm of the MPI, the computing resource of the multi-core processor can be effectively utilized through the MPI, the execution performance of the AC string matching algorithm is improved, and the magnetic disk sensitive information scanning system is particularly suitable for being utilized in the field of information security in which fast scanning is conducted on a large capacity of computer magnetic disk sensitive information, and is also applied to protection of information security check and a pre-warning system.

Description

technical field [0001] The invention relates to the technical field of data information security, in particular to a disk sensitive information scanning system based on an MPI AC string matching parallel algorithm. Background technique [0002] With the continuous development of computer technology, information technology has an increasingly close influence on people's daily life. As a carrier of information, the computer disk stores a large amount of sensitive information of state agencies, enterprises, institutions, and individuals. Once this information is leaked or attacked by hackers, it will cause a certain degree of loss to the organization and individual. Therefore, regular scanning of sensitive disk information by administrators can prevent and warn relevant personnel to protect such information to a certain extent, so as to enhance the awareness of security protection of sensitive information. [0003] The disk sensitive information scanning system is a security p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F21/62
CPCG06F21/6218
Inventor刘嘉辉马翠平宋大华
OwnerHARBIN UNIV OF SCI & TECH