A data caching method based on apriori algorithm

A data cache and data technology, applied in the field of data query, can solve the problems of not being able to cache the data to be queried and data statistics, and achieve the effects of improving data query efficiency, high query efficiency, and reducing retrieval pressure.

Inactive Publication Date: 2017-02-22
NORTHEASTERN UNIV LIAONING
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current mainstream NoSQL databases mostly use the LIRS algorithm to implement the data caching mechanism. However, the LIRS algorithm cannot effectively count the data that is frequently queried for a long period of time, and cannot adopt a targeted strategy to cache the data to be queried.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data caching method based on apriori algorithm
  • A data caching method based on apriori algorithm
  • A data caching method based on apriori algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0030] In this embodiment, under the Hadoop-HBase environment, the query data and user query behavior are simulated using Sina Weibo user data, and T=7 is used to divide the simulation data into 7 equal parts to simulate query logs at different times.

[0031] HBase is a column-oriented NoSQL database that runs on the HDFS file system as part of the Hadoop project. In terms of data reading, HBase adopts a column-based storage method. Compared with a row-based storage method, it reduces redundant data reading during the data reading process, improves data reading efficiency, and makes data retrieval faster and more effective. In terms of storage, HBase divides a large-scale data table into several data areas, that is, data blocks. Each area sequentially stores a certain number of records in the data table. By merging multiple related areas,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data-caching method based on an Apriori algorithm. A query log is established for the condition attribute in a magnetic disk, the query frequency of each data block is computed, a plurality of data blocks with high query frequency form a frequent data block set, the query frequency of the condition attributes in the frequent data block set is computed, and a plurality of condition attributes with high query frequency form a frequent condition attribute set. A frequent condition attribute group set is obtained through the Apriori algorithm, the query frequency is mapped into the supporting degree in the Apriori algorithm, the frequent condition attribute group set is obtained, data corresponding to the frequent condition attribute group set are cached in an internal storage, and an index is established for the frequent condition attributes. According to the data-caching method, data query efficiency can be obviously improved in a frequent zone, compared with a single condition attribute, a plurality of condition attribute groups are cached, higher query efficiency is achieved, then database searching pressure is lowered, and higher query efficiency is achieved.

Description

technical field [0001] The invention belongs to the technical field of data query, and in particular relates to a data caching method based on an Apriori algorithm. Background technique [0002] In recent years, with the rapid development of the Internet, especially the rise of social applications such as Weibo and WeChat, the amount of data has exploded. In 2011, human beings officially entered the ZB era. We have to admit that we are already living in the era of big data. However, big data has been endowed with the characteristics of low value density and various types since its birth, which also determines that massive data will face many problems when querying. When the data size is not too large, the traditional relational database has good performance, high stability, and has been tested by history. But when the amount of data reaches a certain scale, for relational databases, the efficiency is extremely low and unbearable. All in all, relational databases cannot me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F11/34
CPCG06F16/2228G06F16/24552
Inventor 张莉郭昆杨乐游
Owner NORTHEASTERN UNIV LIAONING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products