Magnetic disk sensitive information scanning system based on AC string matching parallel algorithm of MPI
A technology of sensitive information and parallel algorithm, applied in the field of disk sensitive information scanning system
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 1
[0119] Example 1: The matching problem of AC algorithm data block edge.
[0120] The data parallelism of the AC algorithm is to evenly distribute the scanned text to N parallel processes for processing. However, the algorithm needs to solve the matching problem at the edge of the data block. Suppose the given pattern set is: {de, ef}, and the given text is: abcdefgh. A data block is divided into 2 parts, namely, Block_1={abcd} and Block_2={efgh}. Two data blocks are assigned to parallel processes Thread_1 and Thread_2 for execution. The execution of the AC algorithm in Thread_1 did not find the pattern, and the execution of the AC algorithm in Thread_2 found the pattern "ef". Apparently, "de" at the edge of data blocks Block_1 and Block_2 is allocated into two data blocks, and such data division causes pattern "de" to be missed. Therefore, the data parallelism of the AC algorithm has a serious "data block edge matching" problem.
[0121] In actual use, due to the neglect ...
example 2
[0122] Example 2: An example of a disk sensitive information scanning system based on the MPI-based AC string matching parallel algorithm.
[0123] Assume that the pattern string collection file is ExamplePattern. The maximum length of the pattern string MaxPattern=10. The user sets the directory to scan the disk as UserSetDir. Set the maximum length of the data block of the scanned file MaxBlock=300. The currently scanned file name is "mydata.txt", and the total number of file data blocks is N_Block=3; the scanned file traverses the queue Q_File, traverses the directory tree hierarchically, and inserts the leaf nodes into the Q_File queue.
[0124] The main process of the system is:
[0125] Initialize the MPI operating environment, load the MPI system function library, and MPI version information, assuming that the system core processors are equal to 4, that is, N_Procs=4. Master is the master process, and Slave_1, Slave_2, and Slave_3 are slave processes.
[0126] The ...
example 3
[0142] Example 3: Example of function BlockPart().
[0143] Suppose the length of the file File is Len_File=100.
[0144] The implementation process of BlockPart() is as follows:
[0145] If Len_File=100 is less than MaxBlock, set N_Block=1 in the data block information table Block_Info. Such a design can prevent files that are too small from being divided into blocks, and avoid waste of parallel resources.
[0146] Suppose the length of the file File is Len_File=500, MaxBlock=200, MaxPattern=10.
[0147] The implementation process of the function BlockPart() is:
[0148] Define the current location of the file as CurLoc=0, set the initial value of N_Block of the data block information table Block_Info equal to 1, and initialize the data item FileName of the data block information table Block_Info.
[0149] In step S3, (CurLoc+MaxBlock)
[0150] And (Len_File–CurLoc–MaxBlock)>MaxBlock, it can be obtained: (500-0-200)=300>200, so it can be obta...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 
