Memory caching method oriented to range querying on Hadoop

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A memory cache and range technology, applied in the information field, can solve the problems of frequent data transfer in and out memory bumps, inability to establish a cache based on query requirements, and inability to adjust the cache granularity, etc., to improve the hit rate, improve performance, and reduce overhead.

Inactive Publication Date: 2014-07-23

GUANGXI NORMAL UNIV

View PDF2 Cites 42 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

With the improvement of memory storage performance and the continuous decline in price, the memory caching method commonly used in the database field is also constantly playing a role in the field of massive data query. HBase also provides a caching method for data reading and writing. This caching method is for all The hotspot caching mechanism for read and write requests is based on data blocks (64KB). It cannot create a cache for specific query requirements on specific data, nor can it adjust the cache granularity.

In addition, HBase's existing caching mechanism adopts a simple least recently used (LRU) algorithm for measuring hot data, which does not consider the historical access rules of data, and is prone to frequent data transfer into and out of memory.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0035] A memory cache method for range query on Hadoop, comprising the steps of:

[0036]1) Build an index on the query attribute of Hadoop massive data, and store the index on HBase. Because HBase provides good scalability and fault tolerance, it can be considered that HBase has unlimited disk space, and the data in HBase is safe and reliable. HBase distributes data on each node of the cluster, and each node manages a part of the data, which is called a Region. The data in the Region is continuous with the primary key, and HBase uses this to support effective range queries;

[0037] 2) Establish a memory slice cache on the HBase index data. The goal of the cache is to select those index data that are accessed more frequently to be cached in memory, so as to reduce the disk IO (input and output) overhead of data query. Since it is necessary to establish a cache that supports efficient range queries, in the data structure of the memory cache, the present invention establishes ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a memory caching method oriented to range querying on Hadoop. The memory caching method oriented to the range querying on the Hadoop comprises the following steps that (1) an index is established on querying attributes of Hadoop mass data and is stored on an Hbase; (2) a memory is established on index data of the Hbase to conduct fragment caching, the frequently-accessed index data are selected and stored in the memory, data fragments are fragmented in an initial stage by adopting a fixed length equal dividing method, and the mass data fragments are organized by adopting a skiplist; (3) hit data are queried and recorded according to the data, and the heat of the data fragments is measured by adopting an exponential smoothing method; (4) a memory cache is updated. The memory caching method oriented to the range querying on the Hadoop has the advantages that the structure of combining the skiplist and a collection is adopted, the dynamic adjustment of the fragment boundary of the collection is supported on the structure, the data fragments are made to be adaptive to querying demands, the querying cache hit rate of hot data fragments is improved, the overhead of a querying accessed disk is lowered, and thus the performance of the range querying is improved greatly.

Description

technical field [0001] The present invention relates to information technology, especially big data query technology, specifically a range query-oriented memory caching method on Hadoop (an open source distributed parallel architecture developed by the Apache Foundation). Background technique [0002] Massive data storage and query technology has made great progress in recent years. After NoSQL (non-relational data model) data storage bigtable (Google's non-relational data storage system) proposed by Google (Google), there are currently more than a dozen influential NoSQL database products, such as Hadoop HBase (non-relational data storage system in Hadoop distributed parallel architecture), Yahoo PNUTS (Yahoo's non-relational data storage system), Hadoop Cassandra (formerly Facebook (Facebook) non-relational data storage system) System, later acquired by Hadoop as an open source project), Amazon Dynamo (Amazon's non-relational data storage system), and Hypertable (an open s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/24552

Inventor 李先贤葛微

Owner GUANGXI NORMAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Memory caching method oriented to range querying on Hadoop

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology