Unlock instant, AI-driven research and patent intelligence for your innovation.

ETL system based on Spark technology and method thereof

A technology and data technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as high price, difficult expansion, IO throughput, bottlenecks in system resources, etc., to achieve convenient maintenance and reduce disk I/O /O, the effect of improving performance and resource utilization

Inactive Publication Date: 2017-06-27
广东奡风科技股份有限公司
View PDF5 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention proposes an ETL system and method based on Spark technology, which solves the defects of IO throughput and system resource bottlenecks, difficult expansion and high price in the prior art when processing massive data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • ETL system based on Spark technology and method thereof
  • ETL system based on Spark technology and method thereof
  • ETL system based on Spark technology and method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031]The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0032] In order to facilitate and clarify the description of the following examples, some terms are explained before describing the specific embodiments of the present invention in detail, and the following explanations apply to the specification and claims.

[0033] The ETL appearing in the present invention is an abbreviation of English Extract-Transform-Load, which is used to describe the process of extracting, transforming, and loading data from the source t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an ETL system based on a Spark technology. The system comprises a data extraction module, a data processing module, a data integration module, data output module, a metadata management module and a data storage module; the data storage module comprises a transit data storage bank, an integrated data storage bank and a metadata control file; the data extraction module is used for extracting source data, dynamically generating multiple Spark RDDs on distributed nodes and performing parallel processing on the Spark RDDs; the data processing module is used for reading the Spark RDDs generated by the data extraction module and saving the data obtained after metadata matching check and data conversion into the transit data storage bank; the data integration module is used for performing data integration on transit data of the current day and integrated data of the last day and then saving the integrated data into the integrated data storage bank; the data output module is used for performing format conversion on the data obtained after integration on the current day and outputting the data obtained after format conversion. Based on the Spark technology, linear smooth expansion can be realized, running speed is high, manual intervention is not needed, and management and maintenance are easy.

Description

technical field [0001] The present invention relates to an ETL system and its method, in particular to a Spark technology-based ETL system and its method. Background technique [0002] With the development of big data, enterprises pay more and more attention to data-related development and application, so as to obtain more market opportunities. Big data application cannot be separated from the cleaning and processing of massive data. Enterprises usually adopt mainstream ETL (data extraction, Transform and load) products, or directly use database stored procedure coding for data processing. [0003] At present, most mainstream ETL products are based on stand-alone architecture. When processing massive data, there are bottlenecks in IO throughput and system resources, and expansion is difficult and expensive. On the other hand, ETL products focus on the ease of use of the operation interface, and each data processing The process is designed by drawing, but the metadata manage...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/258G06F16/254
Inventor 陈涛黄卓凡张志聪李笋林志广
Owner 广东奡风科技股份有限公司