Method and system for arranging big data ETL tasks

A big data and task technology, applied in database management systems, digital data processing, structured data retrieval, etc., can solve problems such as poor performance and difficult seamless integration, and achieve the effect of superior performance.

Active Publication Date: 2019-09-13
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, because Apache NiFi uses a proprietary distributed computing framework and application container mechanism, it is difficult to achieve seamless integration with big data frameworks such as Hadoop and Spark
In addition, NiFi's traceability mechanism based on Flow File often has extremely poor performance when processing large data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for arranging big data ETL tasks
  • Method and system for arranging big data ETL tasks
  • Method and system for arranging big data ETL tasks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail together with the accompanying drawings.

[0045] This embodiment provides a method for orchestrating big data ETL tasks, see Figure 8 . details as follows:

[0046] 1) The user arranges the ETL tasks according to the requirements. According to the orchestrated ETL task, in the ETL design tool namely Figure 8 Design in the visualization engine in , including data processing component Stop configuration, data flow path configuration and attribute Property configuration.

[0047] 2) The system uses the model description language generator in the visualization engine to generate the model description language ETLDL from the ETL tasks programmed by the user, and send it to the Rest API interface.

[0048] 3) The Rest API interface receives the model description language ETLDL and forwards it to the model ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a system for arranging big data ETL tasks. The method comprises the following steps: 1) carrying out data processing component configuration and data flow direction configuration on an ETL task arranged by a user; 2) according to the data processing component and the data flow direction, generating a model description language for the ETL task arranged by theuser; 3) analyzing the model description language into a directed acyclic graph of the ETL task,wherein nodes in the directed acyclic graph are data processing components, and edges in the directed acyclic graph are data flow directions; and 4) according to the directed acyclic graph of the ETL task, executing the task through an execution engine. In the ETL task execution process, the ETL task execution condition can be monitored, and logs can be analyzed. The system comprises a visualization engine, a Rest API interface, an execution engine, a monitoring module and a log module. According to the method, the big data ETL process can be configured visually, the running state of the ETL is monitored, rich data processing components are provided, and the components are extensible.

Description

technical field [0001] The invention relates to the technical fields of big data, assembly line, visualization and distributed systems, and proposes an arrangement method and system for supporting big data ETL tasks. Background technique [0002] In traditional data analysis scenarios, most of the applications we face are mainly management information systems, and the data is stored in relational databases. In order to meet the analysis requirements without affecting business operations, the data needs to be extracted, transformed, and loaded into a similar relational data warehouse through the ETL (Extract-Transform-Load) process for offline analysis and processing. However, due to the limitation of data volume and computing power, the processing of data is often relatively simple. [0003] With the advent of the big data era, data analysis and processing scenarios are no longer limited to traditional relational databases, such as massive log data, stream data, and real-ti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/25G06F16/2455
CPCG06F16/24568G06F16/254
Inventor 朱小杰沈志宏杜一赵子豪周园春
Owner COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products