Unlock instant, AI-driven research and patent intelligence for your innovation.

HBase database-based data batch loading method and device

A technology of batch warehousing and database, which is applied in the field of HBase database, can solve the problems of long time-consuming and low efficiency of batch data warehousing, and achieve the effect of improving the efficiency of HBase batch warehousing, increasing the generation speed and

Active Publication Date: 2016-07-27
ULTRAPOWER SOFTWARE
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a method and device for data batch storage based on HBase database to solve the problems of long time-consuming and low efficiency of existing HBase database data batch storage

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HBase database-based data batch loading method and device
  • HBase database-based data batch loading method and device
  • HBase database-based data batch loading method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The core idea of ​​the present invention is: aiming at the key link in the existing HBase data that restricts the batch storage of the HBase database, by using the source data to be stored to extract the row key and carry out the average partition according to the number of partitions specified, the data partition The scope is effectively divided to avoid data skew and cross-partition problems in the process of generating HFile files.

[0044] figure 1 It is a flow chart of a method for batch data storage based on HBase database provided by an embodiment of the present invention, see figure 1 , when there is no HBase table in the HBase database, this data batch storage method of the present invention comprises:

[0045] Step S110, extracting and sorting the row keys of the source data to be put into storage, and performing average partitioning of the sorted row keys according to the specified number of partitions, and determining the row keys corresponding to the end v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an Hbase database-based data batch loading method and device. The method comprises the following steps: extracting row keys of to-be-loaded source data, sorting the row keys, carrying out average partitioning on the sorted row keys according to an appointed number of partitions so as to determine a row key corresponding to the end value of each partition range; respectively adding a predetermined length to the row key corresponding to the end value of each partition range to serve as the end value of each pre-established partition range; judging whether an Hbase table exists in an Hbase database or not; if the judging result is negative, creating an Hbase table and establishing partitions in the Hbase table according to the end value of each pre-established partition range; generating corresponding HFile files for the to-be-loaded source data in parallel according to each partition in the HBase table; importing the HFfile files into the HBase table in batches. Through the data batch loading method disclosed in the invention, the HFile file generation speed and loading speed are improved so that the HBase batch loading efficiency is greatly enhanced.

Description

technical field [0001] The invention relates to the technical field of HBase databases, in particular to a method and device for storing data in batches based on the HBase database. Background technique [0002] HBase is a high-reliability, high-performance, column-oriented, and scalable distributed database. HBase is different from general relational databases. It is a database suitable for unstructured data storage. HBase can be used on cheap PCServer Building a large-scale structured storage cluster can effectively reduce storage costs under the background of big data. However, HBase has a problem in batch data storage. It is very slow, time-consuming and inefficient to store large quantities of data through the storage tools provided by HBase itself. For example, when a data file of hundreds of gigabytes is stored in the database Usually it takes 23-24 hours, or even longer. The batch warehousing steps are roughly as follows: 1. First, the data files are generated in p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 唐正才王庆磊张国波
Owner ULTRAPOWER SOFTWARE