Data ETL (Extract Transform Load) system based on storm and treatment method based on storm

A processing method and data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as the migration of a large number of streaming data that cannot be well solved, and achieve the effect of strong practicability

Active Publication Date: 2015-12-30
SHANDONG INSPUR SCI RES INST CO LTD
View PDF4 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Aiming at the fact that the ETL tool based on the flow data processing framework has not yet formed a product and cannot solve the problem of a large amount of flow data migration well, the present invention provides a data ETL system and processing method based on a storm, which is applied to using the flow data framework to convert It is very practical to integrate the data in the relational database into the data warehouse

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data ETL (Extract Transform Load) system based on storm and treatment method based on storm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The present invention will be further described in conjunction with the accompanying drawings.

[0017] A storm-based data ETL system, including a controller module, a connector module and a distributed computing engine;

[0018] The controller module is responsible for receiving user input information, selecting data source and target data storage connectors for the distributed computing engine, setting the ETL topology of the distributed computing engine, and calling the distributed computing engine to initiate the ETL task after the setting is completed;

[0019] The topology includes the number of execution threads, the division of data sources that each thread needs to extract, the data fields that need to be cleaned, etc.

[0020] The connector module has built-in connection drivers for relational databases, Hbase databases and HDFS, which are called by the distributed computing engine when connecting the data source and target data storage;

[0021] The distribu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data ETL (Extract Transform Load) system based on storm and treatment method based on storm and belongs to the technical field of data ETL management. The system is divided into a controller module, a connector module and a distributed calculation engine, wherein the controller module is used for receiving a user command, analyzing command setting and starting a data ETL task; the connector module is internally provided with connection drives of a relational database, an Hbase database and an HDFS (Hadoop Distributed File System) and can be called when the distributed calculation engine is connected with a data source and a target data storage; the storm is used as the distributed calculation engine and is used for receiving parameters set by the controller module to carry out a data ETL task. A user does not need to compile a storm code and only needs to input the command; the controller module is used for analyzing the user command, the storm is automatically set and the ETL task is issued; all the supportable connection drives of the data source and the target data storage are packaged in a connector and are automatically selected and called by a controller.

Description

technical field [0001] The invention discloses a storm-based data ETL system and a processing method, belonging to the technical field of data ETL management. Background technique [0002] Data integration is the logical or physical concentration of data from different sources, formats, and characteristics, so as to provide comprehensive data sharing. It is an important part of enterprise business intelligence and data warehouse systems. ETL is the main solution for enterprise data integration. The three letters in ETL represent Extract, Transform, and Load, that is, extraction, conversion, and loading. Data extraction is the process of extracting data from a data source. As the amount of enterprise data continues to increase, the original relational database can no longer meet the needs of users, and it is necessary to migrate data to a data warehouse that can scale horizontally, such as Hadoop or MPP architecture platforms. The way to extract data from the database is g...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/254
Inventor 曹连超卢军佐亓开元
Owner SHANDONG INSPUR SCI RES INST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products