Memory caching method oriented to range querying on Hadoop

A memory cache and range technology, applied in the information field, can solve the problems of frequent data transfer in and out memory bumps, inability to establish a cache based on query requirements, and inability to adjust the cache granularity, etc., to improve the hit rate, improve performance, and reduce overhead.

Inactive Publication Date: 2014-07-23
GUANGXI NORMAL UNIV
View PDF2 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the improvement of memory storage performance and the continuous decline in price, the memory caching method commonly used in the database field is also constantly playing a role in the field of massive data query. HBase also provides a caching method for data reading and writing. This caching method is for all The hotspot caching mechanism for read and write requests is based

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Memory caching method oriented to range querying on Hadoop
  • Memory caching method oriented to range querying on Hadoop
  • Memory caching method oriented to range querying on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0035] A memory cache method for range query on Hadoop, comprising the steps of:

[0036]1) Build an index on the query attribute of Hadoop massive data, and store the index on HBase. Because HBase provides good scalability and fault tolerance, it can be considered that HBase has unlimited disk space, and the data in HBase is safe and reliable. HBase distributes data on each node of the cluster, and each node manages a part of the data, which is called a Region. The data in the Region is continuous with the primary key, and HBase uses this to support effective range queries;

[0037] 2) Establish a memory slice cache on the HBase index data. The goal of the cache is to select those index data that are accessed more frequently to be cached in memory, so as to reduce the disk IO (input and output) overhead of data query. Since it is necessary to establish a cache that supports efficient range queries, in the data structure of the memory cache, the present invention establishes ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a memory caching method oriented to range querying on Hadoop. The memory caching method oriented to the range querying on the Hadoop comprises the following steps that (1) an index is established on querying attributes of Hadoop mass data and is stored on an Hbase; (2) a memory is established on index data of the Hbase to conduct fragment caching, the frequently-accessed index data are selected and stored in the memory, data fragments are fragmented in an initial stage by adopting a fixed length equal dividing method, and the mass data fragments are organized by adopting a skiplist; (3) hit data are queried and recorded according to the data, and the heat of the data fragments is measured by adopting an exponential smoothing method; (4) a memory cache is updated. The memory caching method oriented to the range querying on the Hadoop has the advantages that the structure of combining the skiplist and a collection is adopted, the dynamic adjustment of the fragment boundary of the collection is supported on the structure, the data fragments are made to be adaptive to querying demands, the querying cache hit rate of hot data fragments is improved, the overhead of a querying accessed disk is lowered, and thus the performance of the range querying is improved greatly.

Description

technical field [0001] The present invention relates to information technology, especially big data query technology, specifically a range query-oriented memory caching method on Hadoop (an open source distributed parallel architecture developed by the Apache Foundation). Background technique [0002] Massive data storage and query technology has made great progress in recent years. After NoSQL (non-relational data model) data storage bigtable (Google's non-relational data storage system) proposed by Google (Google), there are currently more than a dozen influential NoSQL database products, such as Hadoop HBase (non-relational data storage system in Hadoop distributed parallel architecture), Yahoo PNUTS (Yahoo's non-relational data storage system), Hadoop Cassandra (formerly Facebook (Facebook) non-relational data storage system) System, later acquired by Hadoop as an open source project), Amazon Dynamo (Amazon's non-relational data storage system), and Hypertable (an open s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/24552
Inventor 李先贤葛微
Owner GUANGXI NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products