Method and system for extracting data from hadoop database (HBase) in incremental way

A technology for incremental extraction and data extraction, which is applied in the field of big data processing to achieve the effect of satisfying universality

Inactive Publication Date: 2017-01-11
BEIJING GEO POLYMERIZATION TECH
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] In order to overcome the defects of the prior art, the technical problem to be solved in the present invention is to provide a method for incrementally extracting data from HBase, which can be transparent to the business, satisfy the generality, and can avoid over time, It will take more and more time to extract data, and there will be more and more problems with redundant data on the data platform

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for extracting data from hadoop database (HBase) in incremental way

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Such as figure 1 As shown, this method of incrementally extracting data from HBase includes the following steps:

[0028] (1) Create an index table in HBase to store the index information of all business tables;

[0029] (2) Construct the index table so that it includes primary keys and columns, and the primary key sorting mode of the index table is the same as that of the business table;

[0030] (3) Configure a coprocessor for the business table in HBase to support writing index data to the index table;

[0031] (4) Specify the business table to be extracted and the date to be extracted, and then extract the data;

[0032] (5) Save the extracted data to hdfs (that is, Hadoop distributed file system, which is designed as a distributed file system suitable for running on general-purpose hardware. It has many similarities with existing distributed file systems. But at the same time The difference between it and other distributed file systems is also obvious. HDFS is a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The application discloses a method for extracting data from a hadoop database (HBase) in an incremental way. The method is capable of realizing transparency of the service, meeting universality and avoiding the problems that more time is consumed and more redundant data exists on a data platform as time goes on. The method comprises the steps of (1) creating an index table in the HBase for storing index information of all service tables; (2) constructing the index table so as to enable the index table to comprise primary keys and columns, wherein the ordering way of the primary keys of the index table is the same as that of the service tables; (3) configuring a coprocessor for the service tables in the HBase, wherein the coprocessor supports to write index data into the index table; (4) specifying the service table to be extracted and the date to be extracted, and then extracting the data; (5) storing the extracted data in hdfs. The application also provides a system.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a method for incrementally extracting data from HBase and a system for incrementally extracting data from HBase. Background technique [0002] At present, many Internet companies use HBase (Hadoop Database, HBase is an open source implementation of Google Bigtable, similar to Google Bigtable using GFS as its file storage system, HBase uses Hadoop HDFS as its file storage system; Google runs MapReduce to process massive data in Bigtable, HBase also uses Hadoop MapReduce to process massive data in HBase; Google Bigtable uses Chubby as a collaborative service, and HBase uses Zookeeper as a counterpart.) Storage of business system data can support the storage of massive data. [0003] Business data analysis is often performed on a data platform. This requires extracting data from HBase to the data platform. Due to the high cost of extracting data from HBase in full, it ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/254
Inventor 范卫卫张翼温宗臣何良均崔晶晶林佳婕
Owner BEIJING GEO POLYMERIZATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products