HFile based data batch loading method

A data and batch technology, applied in the field of data processing, can solve the problems of low data loading efficiency and time-consuming, and achieve the effect of improving recording efficiency, avoiding excessive time and improving efficiency

Active Publication Date: 2016-10-05
HANGZHOU HIKVISION DIGITAL TECH
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] First of all, when an HBase table is created, there is only one Region by default. When data is loaded, it first enters this Region. When the amount of data reaches a threshold, it will be split into two Regions. The split Region will be distributed to other RegionServers to ensure the load balance of the cluster. , but the division of Region is a time-consuming process, which will lead to inefficiency in data loading
[0008] Secondly, the HFile file is generated through MapReduce. Since the HFile file must be arranged in lexicographical order, all data must first be written to the temporary file through the Map (mapping) process, and then read from the temporary file during the Reduce (simplification) process. Sorting and generating HFile files, this process is very time-consuming

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HFile based data batch loading method
  • HFile based data batch loading method
  • HFile based data batch loading method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples.

[0054] The present invention provides a method for loading data in batches based on HFile, such as figure 1 shown, including the following steps:

[0055] Pre-divide the partition Region to form multiple partition Regions corresponding to the HFile file one by one;

[0056] Read source data records, respectively determine the HFile file to be written to each source data record, and write the source data record into its corresponding HFile file;

[0057] The HFile that has been written to the source data records is loaded to its corresponding partition Region.

[0058] In a typical embodiment of this application, the following methods can be used to form multiple partition Regions corresponding to HFile files one-to-one:

[0059] When pre-partitioning...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an HFile based data batch loading method. Pre-division is performed on a Region to form a plurality of Regions, and then the problem that the division time of the Region is too long in the prior art can be solved; on this basis, an HFile file is generated in a Map stage, a Reduce process can be avoided, and the efficiency can be improved; and the HFile file is successfully generated in the Map stage, and then each generated HFile file can be saved, the problem that all the intermediate results are deleted after a MapReduce task is failed can be solved, and the recording efficiency of the HFile file can be further improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method for batch loading data based on HFile. Background technique [0002] With the rapid development of network technology and the rapid growth of data volume, in order to analyze and utilize these huge data resources, traditional technologies have encountered huge obstacles and are unable to perform the task of big data analysis. In order to meet the requirements of big data analysis, Google proposed MapReduce (mapping simplification) technology, which is a programming model for large-scale data analysis processing and parallel computing. [0003] HBase (Hadoop Database) is a high-reliability, high-performance, column-oriented, and scalable distributed storage system. HBase can be used as the data source and data destination of MapReduce, so that MapReduce can process the data stored in HBase or output Data is stored in HBase. [0004] When HBase is used as the dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 谢忠贤浦世亮周明耀
Owner HANGZHOU HIKVISION DIGITAL TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products