Maximal frequent sequential pattern mining method based on distributed log

A frequent sequence and pattern mining technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low efficiency, lack of parallel extraction of sequence patterns, and high cost of running time, and achieve the effect of solving mining problems

Active Publication Date: 2018-11-23
FUJIAN NORMAL UNIV
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Problem 1: Extracting sequential patterns from huge amounts of distributed log data, the main problem at present is that the efficiency is very low, and there is a lack of effective parallel extraction of sequential patterns
[0006] Problem 2: Existing sequential pattern mining algorithms generally need to maintain a large candidate sequence in the stage of discovering frequent itemsets. When the support threshold is low, the r

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Maximal frequent sequential pattern mining method based on distributed log
  • Maximal frequent sequential pattern mining method based on distributed log
  • Maximal frequent sequential pattern mining method based on distributed log

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The technical solution of the present invention will be specifically described below in conjunction with the accompanying drawings.

[0028] Efficient log sequence analysis mainly involves three steps: log preprocessing, log sequence pattern discovery, and log analysis, among which the improvement of log sequence pattern mining algorithms is the key direction of recent research. At present, a large number of sequential pattern mining algorithms have been proposed in academia, which can be roughly divided into three categories after summarization: algorithms based on Apriori characteristics, algorithms based on vertical grids, and algorithms based on projection databases, as shown in Table 1. Most of the early sequential pattern mining algorithms were developed based on the Apriori feature. They can effectively mine frequent patterns, but the algorithm needs to scan the database multiple times and generate a large number of candidate sequences; for the second type of algo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a maximal frequent sequential pattern mining method based on a distributed log.The method comprises the following steps that: on the basis of a Spark distributed computation framework, extracting a local maximal frequent sequence; utilizing prefix projection to divide search space, and carrying out recursive extraction on the local maximal frequent sequence, wherein a first frequent sequence deletes infrequent items in a log sequence dataset, a database scanning scale is lowered, and meanwhile, a corresponding relationship between the frequent sequential pattern and the maximum frequent sequential pattern is used for reducing a candidate sequence number; extracting a global maximum frequent sequence; storing the local maximal frequent sequence according to different lengths, and carrying out superset detection on the sequential pattern of adjacent lengths to judge whether a superset relationship is in the presence or not; and if the superset relationship is inthe presence, deleting a redundant sequence, and extracting a global maximum frequent sequence. The maximal frequent sequential pattern mining method based on the distributed log has higher efficiency, and supports the mining of larger-scale event sequence data.

Description

technical field [0001] The invention relates to a method for mining a maximum frequent sequence pattern based on a distributed log. Background technique [0002] With the rapid development of technologies such as cloud computing, Internet of Things, and big data, distributed server systems have become the mainstream environment for various application businesses. Various types of user access and service provision make system applications more and more reliable. Analysis of user logs or system service log information is becoming more and more important. The log information in the system is distributed. The traditional method is to concentrate the log information in a distributed environment on one computer for analysis and processing, and to obtain various status information required for system operation and maintenance by mining frequent sequence patterns, but the centralized analysis and mining method will consume a huge amount of resources. communication overhead. At th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 肖如良陈雄蔡声镇陈黎飞许力倪友聪
Owner FUJIAN NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products