Unlock instant, AI-driven research and patent intelligence for your innovation.

Data loading system and data loading method for importing Hive mass data into Hbase

A data loading and massive data technology, applied in the field of Hbase database, can solve the problems of ReigonServer frequent flush, data skew, RegionServer node downtime, etc.

Pending Publication Date: 2022-04-29
CHONGQING CHANGAN AUTOMOBILE CO LTD
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In this scenario, if the scheme of establishing Hive and Hbase mapping tables is adopted, Hbase's Put Api will be called to process data writing in the end, which is very likely to bring greater writing pressure to the Hbase RegionServer node, resulting in instability of the Hbase cluster. Even if the RegionServer is down, the solution has the following problems: (1), causing frequent large GC of the RegionServer, resulting in the downtime of the RegionServer node
(2), causing frequent flushing of ReigonServer, and then continuous compression and splitting, affecting the stability of the cluster
(3) Consuming a large amount of CPU disks, bandwidth resources, memory resources and IO resources of the Hbase cluster, creating resource competition with other businesses
This method avoids the cluster resource consumption problem of calling Hbase Put Api, but still faces the following problems: (1) The Map-Reduce engine is implemented based on a multi-process model. During the calculation process, there are multiple IO operations on temporary data. low efficiency
(2) In the process of generating HFile, Map-Reduce will use the default Reduce provided by hbase to determine the number of generated HFile files according to the number of regions of Hbase. In the calculation process, obvious data skew phenomenon is prone to occur, resulting in data loading Prolonged

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data loading system and data loading method for importing Hive mass data into Hbase
  • Data loading system and data loading method for importing Hive mass data into Hbase
  • Data loading system and data loading method for importing Hive mass data into Hbase

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0019] Such as figure 1 As shown, the Hive massive data provided by the present embodiment imports the data loading system of Hbase, comprises Hive database, Hbase database, Spark-HFile generating module and HFile online data loading module, and described Spark-HFile generating module is used for reading Hive database The data to be loaded in the Hive table in the Hive table, and according to the mapping configuration of the Hive table field and the Hbase table in the Hbase database, after generating the underlying HFile file required by the Hbase table, write the HFile file into the specified HDFS directory; the HFile online data The loading module is used to load the HFile in the HDFS directory specified by the Spark-HFile generation module into the Hbase table online to provide business query services.

[0020] The above modules are all hung on the big data offline scheduling system in the form of task nodes, and the data to be loaded in the Hive table is loaded online into...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data loading system and a data loading method for importing Hive mass data into Hbase, and the data loading method comprises the following steps: step 1, reading to-be-loaded data in a Hive table through a Spark-HFile generation module configured on a big data offline scheduling system, then, an Apache Spark distributed computing engine is adopted to enable data to be loaded in the Hive table to generate a bottom layer HFile file needed by the Hbase table according to mapping configuration of a Hive table field and the Hbase table, and the bottom layer HFile file is written into a specified HDFS directory; and step 2, loading the HFile file in the HDFS directory in the step 1 into an Hbase table on line through an HFile online data loading module configured on a big data offline scheduling system, and providing a user-oriented business query service.

Description

technical field [0001] The invention belongs to the technical field of Hbase databases, and in particular relates to a data loading system and a data loading method for importing Hive mass data into Hbase. Background technique [0002] Hbase is a high-reliability, high-performance, column-oriented, and scalable distributed database. HBase is different from general relational databases. It is a database suitable for structured data storage. Data engineering teams often need to process ETL Import Hive massive data into Hbase to build a user-oriented high-concurrency query service. [0003] The business scenario of Hive importing massive data into Hbase in batches is as follows: the data to be written is located in Hive, and the business needs to periodically import this part of massive data into Hbase to perform random query and update operations. In this scenario, if the scheme of establishing Hive and Hbase mapping tables is adopted, Hbase's Put Api will be called to proces...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/25G06F16/182
CPCG06F16/254G06F16/182
Inventor 黄立蓝文良
Owner CHONGQING CHANGAN AUTOMOBILE CO LTD