Kettle-based method for extraction and statistics of data on large data platform based on kettle

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A big data platform and data extraction technology, applied in database models, relational databases, electrical digital data processing, etc., can solve problems such as big data cluster network resource consumption

Inactive Publication Date: 2017-02-22

ZHENGZHOU YUNHAI INFORMATION TECH CO LTD

View PDF6 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] Currently, through the mapreduce task, all table data is scanned every day to count the data volume. In the case of tens of billions of data, it takes 4-5 hours a day to count the data. During this period, large data cluster computing and network resource consumption are serious.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0030] A kettle-based big data platform data extraction and statistics method, the method transforms the source code of the kettle to obtain the situation of each data extraction task, and records it in an hbase table, the hbase table is called historical situation Table, which records the data volume of all data tables;

[0031] The relational database is regularly incrementally extracted through the sqoop task every day. The amount of data extracted at one time is the data increment of one day. The daily data increment of each data table is recorded and written into the hbase table to realize the data volume. The situation is queried according to the combination of table and time.

[0032] Traditional relational databases supporting online systems and big data technology processing offline statistical analysis will coexist for a long time. In these two systems, the kettle acts as a bridge and is responsible for data transmission. Through the transformation of the source co...

Embodiment 2

[0034] On the basis of Example 1, the method described in this example records the daily data increment of each data table into a data history table of hbase, and performs rowkey (row primary key) on this history table design:

[0035] Serial number rowkey rowkey example qualifier

[0036] 1 {table name} person_info data volume

[0037] 2 {table name} spacer {time} person_info@20150604 data volume

[0038] 3 {time} spacer {table name} 20150604@person_info data volume

[0039] Among them, the table name in rowkey is the table name of the data table, not the table name of the historical situation table;

[0040] In the qualifier of the rowkey in 1, the data amount indicates the total amount of data in the data table recorded in the rowkey, so that the data amount of a certain data table can be quickly queried;

[0041] In 2, the rowkey is composed of the name of the data table and the time. The spacer distinguishes the name of the table from the time. The amount of data in t...

Embodiment 3

[0045] On the basis of Embodiment 2, after obtaining the sqoop task information, this embodiment records the data volume of this task in the historical situation table, and the three rowkeys in the historical situation table must be written or updated.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a kettle-based method for extraction and statistics of data on a large data platform. The method comprises the steps that source codes of kettle are transformed, so that situations of each data extraction task can be obtained and recorded in a hbase table, wherein the hbase table is called as a historical situation table and records data size situations of all data tables; through a sqoop task, timed incremental extraction is conducted to relation-type databases every day, the data size extracted for one time is a data increment of a day; and the everyday data increments in each data table are recoded and written into the hbase table, so that the data size situations can be inquired in a combined manner according to the tables and time. According to the invention, the increment situations are recorded during the data extraction, so additional time does not need to be spent, and nearly no computing and network resources need to be consumed.

Description

technical field [0001] The invention relates to the technical field of computer software applications, in particular to a kettle-based big data platform data extraction and statistical method. Background technique [0002] With the continuous development of cloud computing technology, cloud computing technology has become an important pillar supporting the development of information technology in various industries. Distributed clusters based on hadoop and hbase have become popular research objects of cloud computing at home and abroad. Hadoop's HDFS distributed storage provides a distributed file storage system for the cloud platform, and hbase has good read and write performance and can support tables with large amounts of data, so it is suitable for simple business and online databases with huge data volumes and data storehouse. [0003] Since hbase itself is not suitable as a business database, the business database is often served by mature relational databases such a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

CPCG06F16/284G06F16/2282G06F16/254

Inventor臧勇真魏金雷

OwnerZHENGZHOU YUNHAI INFORMATION TECH CO LTD

Kettle-based method for extraction and statistics of data on large data platform based on kettle

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology