Workflow mechanism-based concurrent ETL (Extract, Transform and Load) conversion method

A conversion method and workflow technology, applied in the direction of concurrent instruction execution, machine execution device, multi-programming device, etc., can solve the problems of low accuracy and low efficiency, achieve high customizability, high degree of standardization, and achieve concurrency Effects of Extraction and Processing

Inactive Publication Date: 2012-10-10
WHALE CLOUD TECH CO LTD
View PDF0 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to propose a concurrent ETL conversion method based on a workflow mechanism for the problems of low efficiency and low accuracy caused by current procedural, centralized, and serialized ETL tools.
At the same time, this engine proposes to build a parallel ETl data extraction engine by building a cluster distributed processing and parallel pipeline technology, which can greatly improve the extraction efficiency of the data itself, and solve the parallel processing of multiple data streams and the bottleneck of conversion processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Workflow mechanism-based concurrent ETL (Extract, Transform and Load) conversion method
  • Workflow mechanism-based concurrent ETL (Extract, Transform and Load) conversion method
  • Workflow mechanism-based concurrent ETL (Extract, Transform and Load) conversion method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0037] Such as figure 1 As shown, a concurrent ETL conversion method based on the workflow mechanism includes the following implementation steps:

[0038] A. Establish a set of data task-oriented workflow engine based on the WFMC model; distribute the workflow instances on different computing nodes through the network for execution, and simultaneously process the global workflow instances collaboratively through the negotiation mechanism. Since the computing nodes can be expanded easily, the scalability and execution efficiency of the system are greatly improved, and the performance bottleneck problem of the centralized execution mode can be obviously improved.

[0039] B. Establish a data extraction conversion execution engine to execute workflow tasks;

[0040] Establish the architecture design of concurrent ETL engine based on RMI remote scheduling...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a workflow mechanism-based concurrent ETL (Extract, Transform and Load) conversion method. By using a workflow technology and a multi-thread concurrent technology, concurrent execution of a plurality of ETL tasks of an ETL workflow and concurrent execution of a plurality of ETL behaviors in a single task are realized. When a plurality of ETL workflows are executed simultaneously and more parallel branches are available in the ETL workflows and the ETL operation, the execution efficiency can be obviously increased. At the same time, according to the method, cluster distribution processing is constructed, and parallel ETL data extraction engines are constructed by a parallel pipeline technology, so that the extraction efficiency of data can be greatly increased, and the parallel processing problem of multiple data flows and the bottleneck problem of conversion processing are solved.

Description

technical field [0001] The present invention relates to the innovation and transformation of the traditional procedural ETL model, especially the engine for reconstructing and optimizing the conventional ETL model combined with workflow technology and data parallel processing technology, which involves data extraction technology, data conversion, cleaning and Refactoring technology, workflow technology, data parallel processing technology, load balancing technology and other fields. [0002] Background technique [0003] At present, ETL refers to the process of extracting data from data sources in the process of building a data warehouse, and loading it into the data warehouse after data conversion. ETL integrates the process of data collection from data sources, data cleaning, data reconstruction, and data loading to the destination database, data mart, and data warehouse. ETL is the key to building a data warehouse system. However, at present, with the continuous increas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/38G06F9/46
Inventor 王渊
Owner WHALE CLOUD TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products