Denormalization strategy selection method based on frequent item set mining algorithm

A technology of frequent itemset mining and frequent itemsets, which is applied in computing, structured data retrieval, special data processing applications, etc., to achieve the effect of solving performance bottlenecks

Active Publication Date: 2014-05-28
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF3 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] Foreign researchers have carried out more in-depth wor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Denormalization strategy selection method based on frequent item set mining algorithm
  • Denormalization strategy selection method based on frequent item set mining algorithm
  • Denormalization strategy selection method based on frequent item set mining algorithm

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0065] Example 1: The denormalization strategy selection method based on the frequent itemset mining algorithm is characterized in that it includes the following steps:

[0066] 1-(a). Obtaining the database log file step: obtaining the database log file to be analyzed;

[0067] 1-(b). Parsing the log step: analyzing the SELECT statement in the log, extracting the table name involved in it, and the field name as the transaction item; then obtaining the transaction record involving cross-table query or only the transaction record of single-table query;

[0068] 1-(c). Data mining step, this step is based on the frequent pattern mining of the simplified prefix tree, which includes three parts in turn:

[0069] (c-1). The step of establishing an FP-tree: read the transaction record set, and establish a frequent pattern tree (FP-tree) based on the preset support experience value, and the persistence threshold is determined by analyzing a large number of denormalized examples, is ...

example 2

[0093] Obtain the database log file to be analyzed: Assume a simplified test data set TestSet, as shown in Table 1.

[0094] The default support count threshold is 3.

[0095]

[0096] Table 1: Test dataset TestSet

[0097] Analyze the SELECT statement in the log, extract the table names involved, and use the field names as transaction items: the frequent 1-itemset (or item sequence conversion table) obtained after the test data set TestSet is read into memory, as shown in Table 2 .

[0098] serial number project 0 course.academy_id 1 academy.academy_id 2 course.course_id 3 teacher.teacher_id 4 give_lesson.givelesson_id

[0099] Table 2: Frequent 1-itemset (item sequence conversion table)

[0100] Such as figure 2 Read the transaction data set into memory, and filter according to the preset support threshold to obtain frequent 1-itemsets;

[0101] Mount all frequent items in the transaction set in the FP-tree;

[0102] Su...

example 3

[0109] Obtain the database log file to be analyzed: Assume a simplified test data set TestSet, as shown in Table 1.

[0110] The default support count threshold is 3.

[0111]

[0112] Table 1: Test dataset TestSet

[0113] Analyze the SELECT statement in the log, extract the table names involved, and use the field names as transaction items: the frequent 1-itemset (or item sequence conversion table) obtained after the test data set TestSet is read into memory, as shown in Table 2 .

[0114] serial number project 0 course.academy_id 1 academy.academy_id 2 course.course_id 3 teacher.teacher_id 4 give_lesson.givelesson_id

[0115] Table 2: Frequent 1-itemset (item sequence conversion table)

[0116] Such as figure 2 Read the transaction data set into memory, and filter according to the preset support threshold to obtain frequent 1-itemsets;

[0117] Mount all frequent items in the transaction set in the FP-tree;

[0118] Su...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a denormalization strategy selection method based on the frequent item set mining algorithm and particularly relates to a denormalization strategy selection method for mass data sets based on the frequent item set mining algorithm. The frequent pattern mining method is applied to guiding database denormalization for the first time; based on the frequent pattern mining algorithm of a concise tree, a brand-new process of establishing the concise tree and a correct counting method, serving for database denormalization selection, are provided. The denormalization strategy selection method based on the frequent item set mining algorithm has the advantages that through the frequent item set mining algorithm of association rules, important association or relation of item sets in mass data is discovered to guide DBA and the like to select and build denormalization strategies of databases, and the problem of performance bottleneck caused by mass table joins in the mass data is solved.

Description

technical field [0001] The invention relates to a denormalization strategy selection method, in particular to a denormalization strategy selection method based on a frequent item set mining algorithm on a massive data set. Background technique [0002] Constructing a relational database must follow certain rules, called paradigms. The higher the paradigm level, the higher the requirements for database design. At the same time, as the paradigm increases, the redundancy of the database decreases step by step, and the data consistency increases step by step. However, relational database theory also has deficiencies. The higher the paradigm, the finer the data model, which means more data tables are required, which requires more table connection operations during the running of the program, although some database systems support stored procedures. And other technologies, but this does not bring revolutionary efficiency improvements, especially when the data of two or more tabl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/284
Inventor 牛新征周冬梅侯孟书杨健
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products